OpenBMB’s MiniCPM-o 4.5, a 9B-parameter open model, ships full-duplex omni-modal interaction via the Omni-Flow framework. It enables continuous voice and video conversations where the model perceives and responds simultaneously on a shared temporal axis.
Key facts
- MiniCPM-o 4.5 is a 9B-parameter open model
- Runs under 12GB RAM for edge deployment
- Surpasses Qwen3-Omni-30B-A3B in omni-modal tasks
- Uses Omni-Flow for time-aligned perception and response
- Ships with code, weights, report, and deployment scripts
OpenBMB released MiniCPM-o 4.5, a 9B-parameter open-weight model that implements full-duplex omni-modal interaction—meaning it can see, hear, and speak simultaneously without turn-based delays. The model uses the Omni-Flow framework, which treats interaction as a continuous stream on a shared temporal axis, aligning visual input, audio input, and output speech/text into time chunks [According to @rohanpaul_ai]. This breaks the traditional walkie-talkie UX of AI models where the user talks, the model waits, then replies.
How Omni-Flow Works
Omni-Flow synchronizes video tokens, audio tokens, LLM hidden states, speech tokens, and waveform generation to one shared timeline. This time-aligned micro-turn architecture allows real-time perception and response, a concept recently previewed by Thinking Machines Lab (TML) for continuous AI interaction [According to @rohanpaul_ai]. The model operates under 12GB RAM for edge deployment, making it feasible on consumer hardware.
Benchmark Performance
OpenBMB claims MiniCPM-o 4.5 surpasses Qwen3-Omni-30B-A3B in omni-modal capabilities and speech generation quality, though specific benchmark scores were not disclosed. The model is open-source with code, weights, a report, and deployment scripts available.
Unique Take
This is not a demo or a research preview—it’s a shipped open model that redefines the interaction layer for AI. While companies like Google and OpenAI demo full-duplex features behind closed APIs, OpenBMB has released a working 9B model that anyone can run locally. The architectural innovation—time-aligned perception and response—turns voice AI from a query-response system into a conversation.
What to watch

Watch for independent benchmark evaluations of MiniCPM-o 4.5 on omni-modal tasks like speech generation quality and real-time latency. Also track adoption in open-source voice assistants and whether larger labs adopt time-aligned micro-turn architectures in their next releases.







