Skip to content
gentic.news — AI News Intelligence Platform
Connecting to the Living Graph…

Listen to today's AI briefing

Daily podcast — 5 min, AI-narrated summary of top stories

OpenBMB's MiniCPM-o 4.5 model interface showing continuous voice and video conversation with Omni-Flow framework
AI ResearchScore: 88

MiniCPM-o 4.5 Ships Full-Duplex Omni-Modal AI at 9B Parameters

OpenBMB's MiniCPM-o 4.5 is a 9B open model with full-duplex omni-modal interaction, outperforming Qwen3-Omni-30B-A3B and running under 12GB RAM.

·21h ago·2 min read··38 views·AI-Generated·Report error
Share:
What is MiniCPM-o 4.5 and how does it change AI interaction?

MiniCPM-o 4.5, a 9B-parameter open model from OpenBMB, implements full-duplex omni-modal interaction via Omni-Flow, enabling continuous voice and video conversations with time-aligned perception and response, outperforming Qwen3-Omni-30B-A3B.

TL;DR

9B open model does continuous voice/video interaction · Breaks turn-based AI with time-aligned micro-turns · Outperforms Qwen3-Omni-30B-A3B in omni-modal tasks

OpenBMB’s MiniCPM-o 4.5, a 9B-parameter open model, ships full-duplex omni-modal interaction via the Omni-Flow framework. It enables continuous voice and video conversations where the model perceives and responds simultaneously on a shared temporal axis.

Key facts

  • MiniCPM-o 4.5 is a 9B-parameter open model
  • Runs under 12GB RAM for edge deployment
  • Surpasses Qwen3-Omni-30B-A3B in omni-modal tasks
  • Uses Omni-Flow for time-aligned perception and response
  • Ships with code, weights, report, and deployment scripts

OpenBMB released MiniCPM-o 4.5, a 9B-parameter open-weight model that implements full-duplex omni-modal interaction—meaning it can see, hear, and speak simultaneously without turn-based delays. The model uses the Omni-Flow framework, which treats interaction as a continuous stream on a shared temporal axis, aligning visual input, audio input, and output speech/text into time chunks [According to @rohanpaul_ai]. This breaks the traditional walkie-talkie UX of AI models where the user talks, the model waits, then replies.

How Omni-Flow Works

Omni-Flow synchronizes video tokens, audio tokens, LLM hidden states, speech tokens, and waveform generation to one shared timeline. This time-aligned micro-turn architecture allows real-time perception and response, a concept recently previewed by Thinking Machines Lab (TML) for continuous AI interaction [According to @rohanpaul_ai]. The model operates under 12GB RAM for edge deployment, making it feasible on consumer hardware.

Benchmark Performance

OpenBMB claims MiniCPM-o 4.5 surpasses Qwen3-Omni-30B-A3B in omni-modal capabilities and speech generation quality, though specific benchmark scores were not disclosed. The model is open-source with code, weights, a report, and deployment scripts available.

Unique Take

This is not a demo or a research preview—it’s a shipped open model that redefines the interaction layer for AI. While companies like Google and OpenAI demo full-duplex features behind closed APIs, OpenBMB has released a working 9B model that anyone can run locally. The architectural innovation—time-aligned perception and response—turns voice AI from a query-response system into a conversation.

What to watch

MiniCPM-o 4.5, an Open-Source Full-Duplex Multimodal Model

Watch for independent benchmark evaluations of MiniCPM-o 4.5 on omni-modal tasks like speech generation quality and real-time latency. Also track adoption in open-source voice assistants and whether larger labs adopt time-aligned micro-turn architectures in their next releases.

Source: gentic.news · · author= · citation.json

AI-assisted reporting. Generated by gentic.news from multiple verified sources, fact-checked against the Living Graph of 4,300+ entities. Edited by Ala SMITH.

Following this story?

Get a weekly digest with AI predictions, trends, and analysis — free.

AI Analysis

The key structural shift here is from turn-based to continuous interaction. Most voice AI systems—including GPT-4o's voice mode—still operate on a query-response loop, even if low-latency. Omni-Flow's time-aligned micro-turn architecture treats the conversation as a single shared temporal stream, which is architecturally closer to how humans converse. The fact that this runs at 9B parameters and fits under 12GB RAM is notable because it suggests the approach is compute-efficient compared to larger omni-modal models. The comparison to Qwen3-Omni-30B-A3B is telling: a 9B model outperforming a 30B model in omni-modal capabilities suggests the architecture matters more than raw parameter count. However, the lack of disclosed benchmark numbers means we should treat the performance claim with mild skepticism pending independent replication. The open release is the real story—this gives the research community a concrete baseline for full-duplex interaction, which could accelerate progress faster than proprietary demos.
Compare side-by-side
MiniCPM-o 4.5 vs Qwen3-Omni-30B-A3B
Enjoyed this article?
Share:

AI Toolslive

Five one-click lenses on this article. Cached for 24h.

Pick a tool above to generate an instant lens on this article.

Related Articles

From the lab

The framework underneath this story

Every article on this site sits on top of one engine and one framework — both built by the lab.

More in AI Research

View all