streaming

30 articles about streaming in AI news

DIET: A New Framework for Continually Distilling Streaming Datasets in Recommender Systems

Researchers propose DIET, a framework for streaming dataset distillation in recommender systems. It maintains a compact, evolving dataset (1-2% of original size) that preserves training-critical signals, reducing model iteration costs by up to 60x while maintaining performance trends.

Mar 27, 202688% relevant

Kimi 2.5's 1T Parameter MoE Model Runs on 96GB Mac Hardware via SSD Streaming

Developers have demonstrated that Kimi 2.5's 1 trillion parameter Mixture-of-Experts model can run on Mac hardware with just 96GB RAM by streaming expert weights from SSD, with only 32B parameters active per token.

Mar 24, 202685% relevant

Qwen 3.5 397B-A17B MoE Model Runs on M3 Mac at 5.7 TPS with 5.5GB Active Memory via SSD Streaming

Developer Dan reportedly runs the 209GB Qwen 3.5 397B-A17B MoE model on an M3 Mac at ~5.7 tokens per second using only 5.5GB of active memory by quantizing and streaming weights from SSD.

Mar 18, 202685% relevant

Global TV Liberation: How Open Source Collaboration Is Disrupting Streaming

An open-source project called Free-TV/IPTV has compiled free live TV channels from over 60 countries into a single M3U playlist. With 88 contributors maintaining the repository, this GitHub project offers HD streams from major platforms without subscriptions.

Mar 7, 202685% relevant

Run Claude Code in Any Sandbox with One API: AgentBox SDK

Swap coding agents and sandbox providers without changing code. Preserves full interactive capabilities (approval flows, streaming).

Apr 23, 2026100% relevant

IAT: Instance-As-Token Compression for Historical User Sequence Modeling

Researchers propose Instance-As-Token (IAT), which compresses all features of each historical interaction into a unified embedding token, then applies standard sequence modeling. This approach outperforms state-of-the-art methods and has been deployed in e-commerce advertising, shopping mall marketing, and live-streaming e-commerce with substantial business metric improvements.

Apr 13, 202693% relevant

OpenClaw Voice Interface Demo Shows Real-Time AI Assistant Hardware

A developer showcased a custom hardware rig that integrates a push-button voice interface with the OpenClaw AI model, streaming responses in real-time. This demonstrates a tangible, open-source alternative to proprietary voice assistants like Amazon Alexa.

Apr 9, 202675% relevant

scan-for-secrets 0.2: Streamline Your Security Workflow with New CLI Options

Simon Willison's scan-for-secrets 0.2 adds streaming output, multi-directory scanning, and file-specific options that developers can use immediately in Claude Code workflows.

Apr 5, 202675% relevant

Building a Memory Layer for a Voice AI Agent: A Developer's Blueprint

A developer shares a technical case study on building a voice-first journal app, focusing on the critical memory layer. The article details using Redis Agent Memory Server for working/long-term memory and key latency optimizations like streaming APIs and parallel fetches to meet voice's strict responsiveness demands.

Apr 4, 202676% relevant

Storing Less, Finding More: Novelty Filtering Architecture for Cross-Modal Retrieval on Edge Cameras

A new streaming retrieval architecture uses an on-device 'epsilon-net' filter to retain only semantically novel video frames, dramatically improving cross-modal search accuracy while reducing power consumption to 2.7 mW. This addresses the fundamental problem of redundant frames crowding out correct results in continuous video streams.

Apr 1, 202682% relevant

Extended Thinking's Two-Block Response: What Claude Code Users Need to Know

Extended Thinking returns separate thinking and text blocks - handle them correctly in streaming or your UI will show raw reasoning.

Mar 22, 202680% relevant

FastAPI-FullStack: Production-Ready Template for AI Agent Apps with FastAPI, Next.js, and Framework Choice

A new open-source template, fastapi-fullstack, provides a pre-built foundation for deploying AI agent applications. It integrates FastAPI, Next.js, and multiple agent frameworks with WebSocket streaming, authentication, and database support out of the box.

Mar 20, 202685% relevant

Claude-to-IM Skill: Get Claude Code in Your Team Chat (Without OpenClaw's Security Risks)

Open-source bridge brings Claude Code to Telegram/Discord with permission prompts, streaming, and persistent sessions—safer alternative to OpenClaw.

Mar 18, 202695% relevant

OmniForcing Enables Real-Time Joint Audio-Visual Generation at 25 FPS with 0.7s Latency

Researchers introduced OmniForcing, a method that distills a bidirectional LTX-2 model into a causal streaming generator for joint audio-visual synthesis. It achieves ~25 FPS with 0.7s latency, a 35× speedup over offline diffusion models while maintaining multi-modal fidelity.

Mar 16, 202692% relevant

The Billion-Dollar Bet on AI World Models: How AMI's Funding Signals a New Era of Machine Understanding

AMI's $1 billion funding round for world model development highlights a strategic shift toward AI systems that understand physical reality. Meanwhile, robotics and creative AI tools see massive investments, with YouTube maintaining streaming dominance.

Mar 12, 202698% relevant

Gemini 3.5 Live Translate Debuts as Real-Time Audio Model

Google DeepMind released Gemini 3.5 Live Translate, an audio model for real-time translation, but disclosed no pricing, latency, or language pair details.

Jun 9, 202687% relevant

DeepSeek-V4 Hits 500K Context with 90% Less KV Cache via FlashMemory

DeepSeek-V4 achieves 500K context with 90% less KV cache via FlashMemory's lookahead sparse attention, keeping only 13.5% of cache in GPU memory without retraining.

Jun 9, 202698% relevant

Kling AI Video Enters Hollywood Production with 'House of David'

Kling AI video used in 'House of David', first Hollywood production at industrial scale. Show reached 44M+ viewers, #1 on Prime Video U.S.

May 24, 202685% relevant

train-llm-from-scratch: 1B-Parameter LLM on a Single GPU

train-llm-from-scratch trains billion-parameter LLMs on a single GPU, cutting costs from $10M+ to consumer hardware.

May 20, 202685% relevant

Claude Code's Six-Layer Architecture: Harness, Not Magic

Claude Code's six-layer architecture uses a 3-layer context compressor at 92% threshold and Redis-based multi-agent FSM protocol. The model is just one node in a harness.

May 10, 2026100% relevant

Two-Tower vs Vector DB + LLM: Which Wins for RecSys at Scale?

Two-tower models offer sub-10ms latency for cold-start; vector DB + LLM provides richer semantics. Hybrid architectures reduce churn by 15-20%.

May 9, 2026100% relevant

Microsoft’s VibeVoice: Open-Source Speech-to-Text with Diarization

Microsoft released VibeVoice, an MIT-licensed speech-to-text model with built-in speaker diarization. Simon Willison tested a 4-bit MLX conversion on an M5 MacBook, transcribing 1 hour of audio in ~9 minutes using ~60GB RAM.

Apr 27, 202685% relevant

Free-Claude-Code Proxy Routes Anthropic API to Free NVIDIA NIM Models

A developer released free-claude-code, a proxy that intercepts Claude Code's API calls and routes them to free NVIDIA NIM endpoints, unlocking free access to models like Kimi K2 and GLM 4.7. This bypasses Anthropic's subscription fees and adds remote execution via a Telegram bot.

Apr 22, 202691% relevant

Catching Drift Before It Catches You

The author details implementing the open-source Evidently AI library to monitor a Kafka-powered movie recommender for data drift. This is a hands-on guide to a fundamental MLOps task for maintaining live AI systems.

Apr 20, 202696% relevant

AI-Powered PS4 Emulator 'Spine' Runs Bloodborne Locally on PC

A developer has released Spine, a PS4 emulator that uses AI techniques to run Bloodborne fully on PC. This represents a major step forward in console emulation, previously considered years away.

Apr 20, 202687% relevant

Prefill-as-a-Service Paper Claims to Decouple LLM Inference Bottleneck

A research paper proposes a 'Prefill-as-a-Service' architecture to separate the heavy prefill computation from the lighter decoding phase in LLM inference. This could enable new deployment models where resource-constrained devices handle only the decoding step.

Apr 20, 202685% relevant

GPT-5.5 Limited Rollout Begins, Frontend Improvements Noted

OpenAI has started a limited rollout of GPT-5.5 to select users, with early reports highlighting significant frontend quality improvements. This suggests an incremental update focused on user experience rather than core model capabilities.

Apr 19, 202685% relevant

Vibe's $227M ARR Shows AI-Powered CTV Ads Are Eating Linear TV Budgets

Ad platform Vibe.co reports $227M in annual recurring revenue, growing 264% year-over-year. The surge is driven by AI that optimizes Connected TV ads by combining identity graphs with transactional data, convincing brands to shift major budgets.

Apr 17, 202687% relevant

A Practical Guide to Building Real-Time Recommendation Systems

This article provides a practical overview of building real-time recommendation systems, covering core components like data ingestion, feature stores, and model serving. It matters because real-time personalization is becoming a baseline expectation in digital commerce.

Apr 17, 202678% relevant

Onlook: Open-Source AI Tool Edits React Code Visually, Hits 23.9K GitHub Stars

Onlook, an open-source desktop app, enables visual editing of live React and Next.js applications, with AI generating and writing code changes directly to the codebase. It has gained 23.9K GitHub stars, positioning itself as a free alternative to paid design tools like Figma.

Apr 17, 202689% relevant

Explore More

AI Agents Large Language Models Claude Code OpenAI RAG MCP Fine-tuning Benchmarks Open Source AI AI Safety