video synthesis

30 articles about video synthesis in AI news

HeyGen Launches CLI Tool for AI Video Generation from Terminal

AI video platform HeyGen has launched a CLI tool, allowing users to generate videos with avatars, voice, and script via terminal commands. This moves video synthesis from a web dashboard into developer workflows.

Apr 13, 202685% relevant

ByteDance's Helios: A 14B Parameter Video Generation Model Running at 19.5 FPS on a Single H100 GPU

ByteDance has introduced Helios, a 14-billion parameter video generation model that reportedly runs at 19.5 frames per second on a single NVIDIA H100 GPU. This represents a significant step in making high-quality, real-time video synthesis more computationally accessible.

Mar 23, 202695% relevant

UniVidX Generates Video From 1,000 Samples, SIGGRAPH 2026

UniVidX generates omni-directional video from <1,000 training samples, using diffusion priors with stochastic masking, accepted at SIGGRAPH 2026.

May 4, 202685% relevant

ByteDance's OmniShow Unifies Text, Image, Audio, Pose for Video Gen

ByteDance introduced OmniShow, a unified multimodal framework for video generation that accepts text, reference images, audio, and pose inputs simultaneously. It claims state-of-the-art performance across diverse conditioning settings.

Apr 14, 202685% relevant

Elon Musk Predicts 'Vast Majority' of AI Compute Will Be for Real-Time Video

Elon Musk states that real-time video consumption and generation will consume most AI compute, highlighting a shift from text to video as the primary medium for AI processing.

Mar 29, 202685% relevant

Sam3 + MLX Enables Local, Multi-Object Video Tracking Without Cloud APIs

A developer has combined Meta's Segment Anything 3 (Sam3) with Apple's MLX framework to enable local, on-device object tracking in videos. This bypasses cloud API costs and latency for computer vision tasks.

Mar 29, 202685% relevant

Geometric Latent Diffusion (GLD) Achieves SOTA Novel View Synthesis, Trains 4.4× Faster Than VAE

GLD repurposes features from geometric foundation models like Depth Anything 3 as a latent space for multi-view diffusion. It trains significantly faster than VAE-based approaches and achieves state-of-the-art novel view synthesis without text-to-image pretraining.

Mar 28, 202695% relevant

NotebookLM's Video Generation: When AI Consultants Advise Sauron on Volcano Security

Google's NotebookLM has introduced a video generation feature that can create professional consultant-style presentations from research materials. The demonstration shows AI analyzing Tolkien's lore to advise Sauron on securing Mount Doom with a simple door.

Mar 10, 202685% relevant

Kling AI 3.0 Arrives with Breakthrough Motion Control for Video Generation

Kling AI has launched version 3.0 featuring advanced motion control capabilities, representing a significant leap in AI-generated video technology. The update promises more precise manipulation of movement within AI-created videos.

Mar 9, 202685% relevant

AIVideo Agent Emerges as First Complete AI Video Production Pipeline

A new AI system called AIVideo Agent promises to automate the entire video production workflow from concept to final edit. Positioned as the "OpenClaw for video," this development could revolutionize content creation for creators and businesses alike.

Mar 5, 202685% relevant

AI Research Breakthroughs: From Video Reasoning to Self-Stopping Models

This week's top AI papers reveal major advances in video understanding, reasoning efficiency, and agent training. Researchers introduced a massive video reasoning dataset, models that know when to stop thinking, and techniques for improving AI agents without full retraining.

Mar 1, 202695% relevant

BetterScene Bridges the Gap: How Aligning AI Representations Unlocks Photorealistic 3D Synthesis

Researchers introduce BetterScene, a novel AI method that dramatically improves 3D scene generation from just a handful of photos. By aligning the internal representations of a powerful video diffusion model, it produces consistent, artifact-free novel views, pushing the boundary of what's possible in computational photography and virtual world creation.

Feb 27, 202678% relevant

PixVerse's 'Playable Reality': AI Blurs Lines Between Video, Games and Virtual Worlds

PixVerse introduces 'Playable Reality,' an AI-generated medium that defies traditional categorization. Blending elements of video, gaming, and virtual environments, this technology creates interactive, dynamic experiences rather than static content.

Feb 26, 202685% relevant

Kyutai Labs Releases OVIE: Single-Image Novel View Synthesis Model

French AI lab Kyutai Labs released OVIE, a novel view generation model trained only on single images, bypassing the need for costly multi-view datasets. This could democratize 3D content creation from 2D photos.

Apr 15, 202685% relevant

AI Reconstructs Raphael's 'School of Athens' with Animated Figures

A researcher used an AI tool called Seedance 2.0 to generate an animated version of Raphael's 'The School of Athens,' bringing the depicted philosophical debate to life. This demonstrates a novel application of generative video AI for art historical interpretation.

Apr 10, 202675% relevant

Alibaba's Qwen3.5-Omni Launches with Script-Level Captioning, Audio-Visual Vibe Coding, and Real-Time Web Search

Alibaba's Qwen team has released Qwen3.5-Omni, a multimodal model focused on interpreting images, audio, and video with new capabilities like script-level captioning and 'vibe coding'. It's open-access on Hugging Face but does not generate media.

Mar 30, 202685% relevant

OmniForcing Enables Real-Time Joint Audio-Visual Generation at 25 FPS with 0.7s Latency

Researchers introduced OmniForcing, a method that distills a bidirectional LTX-2 model into a causal streaming generator for joint audio-visual synthesis. It achieves ~25 FPS with 0.7s latency, a 35× speedup over offline diffusion models while maintaining multi-modal fidelity.

Mar 16, 202692% relevant

ElevenLabs Unleashes 'Flows': The Unified AI Creative Suite That Could Revolutionize Content Production

ElevenLabs has launched Flows, a groundbreaking AI platform that seamlessly integrates image, video, voice, music, and sound effects generation into a single visual pipeline. This eliminates tool-switching and re-exporting, potentially transforming creative workflows.

Mar 13, 202685% relevant

ByteDance's DeerFlow: The Open-Source AI Agent That Works Like a Digital Employee

ByteDance has open-sourced DeerFlow, an autonomous AI agent capable of handling complex tasks like research, coding, and video generation. Operating with its own virtual computer environment, it represents a shift from chatbots to functional AI workers.

Mar 12, 202697% relevant

DeepMind's Diffusion Breakthrough: Training Better Latents for Superior AI Generation

Google DeepMind researchers have developed new techniques for training latent representations in diffusion models, potentially leading to more efficient, higher-quality AI-generated content across images, audio, and video domains.

Feb 26, 202685% relevant

The AI Music Revolution: How Google and Apple Are Democratizing Music Creation

Google and Apple are integrating generative AI music features into their core platforms, allowing users to create custom 30-second tracks from text, photos, or video prompts. This move signals AI's transition from experimental tools to mainstream consumer applications.

Feb 18, 202670% relevant

MiniMax Music-2.6 Goes Free on Cloudflare This Week

MiniMax's Music-2.6 AI model is available for free on Cloudflare's platform this week, allowing users to generate full-length songs or instrumentals from text prompts.

Apr 27, 202675% relevant

GPT ImageGen-2 Passes 'Otter Test', Generates Academic Papers

Wharton professor Ethan Mollick reports OpenAI's GPT ImageGen-2 now reliably generates complex text within images, including academic papers and slides, marking a significant leap in multimodal AI capability.

Apr 21, 202683% relevant

Omar Sar Uses Opus 4.7 Agent to Turn Podcasts into Self-Improving Wikis

AI researcher Omar Sar automated podcast consumption using an Opus 4.7 agent that extracts insights, generates analysis, and builds interactive HTML/JS artifacts. The system creates a self-improving knowledge wiki for agentic research workflows.

Apr 19, 202687% relevant

Google Launches Gemini 3.1 Flash TTS with Prompt-Controlled Speech

Google has launched Gemini 3.1 Flash TTS, a text-to-speech model featuring prompt-based voice control and support for over 70 languages. This release expands Google's multimodal AI offerings directly to developers.

Apr 15, 202693% relevant

OpenBMB's VoxCPM 2: 2B-Param Open-Source TTS for Multilingual Voice

OpenBMB launched VoxCPM 2, a 2-billion-parameter open-source text-to-speech model. It generates multilingual, emotionally expressive speech from text descriptions and runs on consumer-grade hardware.

Apr 13, 202697% relevant

Vanast Unifies Virtual Try-On & Animation in Single-Step CVPR 2026 Framework

A CVPR 2026 paper introduces Vanast, a unified model for virtual try-on and human image animation in one step. It aims to preserve identity and enable zero-shot interpolation, streamlining a traditionally complex process.

Apr 12, 202685% relevant

Figure CEO: Data Scarcity is the 'Only Thing' Holding Back General Robots

Figure CEO Brett Adcock asserts that solving general robotics is contingent on acquiring a 'pile of data' for training, highlighting the extreme cost and difficulty of collecting real-world robotic interaction data.

Apr 12, 202685% relevant

Palantir CEO Karp: AI Will 'Destroy Humanities Jobs', Shift to Vocational Skills

Palantir CEO Alex Karp warns AI will 'destroy humanities jobs,' arguing broad degrees lose value while vocational skills and neurodivergent traits become key advantages. He insists there will still be 'more than enough jobs,' just redistributed toward practical roles.

Apr 12, 202685% relevant

PetClaw AI Agent Automates Research Stack, Replaces $200/Month Tools

A developer claims PetClaw's desktop AI agent automated their entire research workflow—browsing, sourcing, dashboard building—and saved it as a reusable skill, replacing multiple paid tools. No code was written.

Apr 11, 202687% relevant

Explore More

AI Agents Large Language Models Claude Code OpenAI RAG MCP Fine-tuning Benchmarks Open Source AI AI Safety