demonstration

30 articles about demonstration in AI news

Evolving Demonstration Optimization: A New Framework for LLM-Driven Feature Transformation

Researchers propose a novel framework that uses reinforcement learning and an evolving experience library to optimize LLM prompts for feature transformation tasks. The method outperforms classical and static LLM approaches on tabular data benchmarks.

Mar 12, 202670% relevant

DART: One-Shot Robot Adaptation via Weight Space Arithmetic

DART from Seoul National University adapts robot policies with one demonstration using weight space arithmetic, achieving 73% success on unseen domain shifts.

Jul 3, 202685% relevant

Opus 4.8 Builds Full RPG in Claude Code With Zero Feedback

Opus 4.8 autonomously built and deployed a complete RPG via Claude Code with zero human feedback, per @emollick's demonstration.

May 28, 2026100% relevant

Toyota CUE7 Robot Makes Free Throws at Tokyo Basketball Game

Toyota's CUE7 robot successfully performed dribbling and free throws during a live halftime show in Tokyo. The demonstration highlights advances in real-world, dynamic bipedal/wheeled robotics.

Apr 19, 202687% relevant

Google's RT-X Project Establishes New Robot Learning Standard

Google's RT-X project has established a new standard for robot learning by creating a unified dataset of detailed human demonstrations across 22 institutions and 30+ robot types. This enables large-scale cross-robot training previously impossible with fragmented data.

Apr 5, 202685% relevant

The AI Agent Production Gap: Why 86% of Agent Pilots Never Reach Production

A Medium article highlights the stark reality that most AI agent demonstrations fail to transition to production systems, citing a critical gap between prototype and deployment. This follows recent industry analysis revealing similar failure rates.

Mar 31, 202690% relevant

Figure AI CEO Brett Adcock Demonstrates Figure 03 Robot in Live Interview, Showcasing Real-World Mobility

Figure AI CEO Brett Adcock brought a Figure 03 humanoid robot to an in-person interview for a live demonstration. The event highlights the company's push for real-world validation and public visibility of its flagship platform.

Mar 28, 202685% relevant

Neuralink Patient Plays World of Warcraft Using Brain-Computer Interface, Demonstrating Complex Control

A Neuralink implant recipient has reportedly played World of Warcraft using only thought-based control. The demonstration highlights the BCI's ability to manage complex, multi-action gameplay.

Mar 16, 202685% relevant

NVIDIA's 2.5-Hour Autonomous Drive Through San Francisco Signals Major Breakthrough in AI-Powered Transportation

NVIDIA CEO Jensen Huang took a 2.5-hour autonomous ride through San Francisco in a Mercedes, powered by NVIDIA's next-generation AI platform. The demonstration showcases significant progress in real-world autonomous driving capabilities.

Mar 13, 202687% relevant

NotebookLM's Video Generation: When AI Consultants Advise Sauron on Volcano Security

Google's NotebookLM has introduced a video generation feature that can create professional consultant-style presentations from research materials. The demonstration shows AI analyzing Tolkien's lore to advise Sauron on securing Mount Doom with a simple door.

Mar 10, 202685% relevant

AI Video Generation Reaches New Milestone: Kling AI 5.3 Launches with Enhanced Capabilities

The latest version of Kling AI, version 5.3, has officially launched, marking another advancement in AI-powered video generation technology. Early adopters are already sharing YouTube demonstrations showcasing improved capabilities.

Mar 3, 202685% relevant

Mastercard's AI Agent Demo Signals the Dawn of Autonomous Commerce

Mastercard's recent demonstration of fully authenticated 'agentic commerce' reveals a future where AI agents autonomously handle shopping, payments, and negotiations. This shift promises to transform consumer experiences and business operations through intelligent automation.

Feb 23, 202675% relevant

GDPval Benchmark Reveals AI's Professional Competence: A New Tool for Economic Planning

A new interactive demonstration using OpenAI's GDPval benchmark shows current AI capabilities across economically valuable professional tasks. The project aims to make AI's real-world impact tangible for policymakers and civil society organizations, bridging the gap between technical assessments and practical economic decisions.

Feb 20, 202675% relevant

AI Learns to Use Tools Without Expensive Training: The Rise of In-Context Reinforcement Learning

Researchers have developed In-Context Reinforcement Learning (ICRL), a method that teaches large language models to use external tools through demonstration examples during reinforcement learning. This approach eliminates costly supervised fine-tuning while enabling models to gradually transition from few-shot to zero-shot tool usage capabilities.

Mar 13, 202687% relevant

Anthropic's Fable 5 gets production workshop series from @_vmlops

Anthropic's Fable 5 gets production workshop series from @_vmlops covering capability curves, reliable agents, and deployment at scale.

Jul 5, 2026100% relevant

ByteDance Seed Turns Cheap Human Videos Into Robot Skills

ByteDance Seed replaces noisy 6DoF hand poses with relative wrist translation, creating a shared action space for humans and bi-manual robots that scales with cheap data and outperforms full-pose baselines.

Jun 29, 202682% relevant

IBM Shows Sub-1-nm Chips, Targeting Production in 5 Years

IBM showed sub-1-nm chips at IEDM, targeting production in 5 years. It challenges TSMC and Intel in the race to shrink transistors for AI workloads.

Jun 25, 202692% relevant

JUPITER Exascale Maps Brain at Cellular Scale on 4,096 Grace Hopper Nodes

JUPITER, Europe's first exascale supercomputer, trained CytoNet brain model on 6.5 PB in 5 days and runs climate, 6G, and quantum simulations.

Jun 22, 202685% relevant

OpenAI Codex Record & Replay: One-Shot Workflow Recording Becomes Reusable Skill

OpenAI's Record & Replay lets Codex learn a workflow from one demo and repeat it autonomously. The feature is blocked in the EU, UK, and Switzerland.

Jun 20, 202694% relevant

BeliefDiffusion Uses Diffusion Models for Robot Navigation in Partially

BeliefDiffusion combines diffusion models with MPC for robot navigation in partially observable environments, outperforming model-free RL and generative baselines in synthetic maps.

Jun 18, 202669% relevant

Alignment Pretraining Could Backfire, LessWrong Post Warns

LessWrong post warns synthetic alignment pretraining data could backfire in capable LLMs, leading to rebel personas.

Jun 17, 202680% relevant

Fable 5: Claude's Biggest Leap Since Opus 4.5, Says Beta Tester

Beta tester says Fable 5 is Claude's biggest leap since Opus 4.5, with emergent debugging and design capabilities.

Jun 9, 2026100% relevant

Claude Code Generates Production Lottie Animations via Show HN

Claude Code claimed to generate production Lottie animations via Show HN. No demo or code published; 2 points, 0 comments. Unverified.

Jun 8, 202675% relevant

Unitree G1 humanoid robots mirror dancer in real time via motion cap

Unitree G1 humanoid robots mirrored a dancer in real time via motion capture at a Shanghai event, part of a 100-person tracking challenge.

Jun 6, 202679% relevant

Persuasion Techniques Boost LLM Compliance from 35% to 51% in PNAS Study

PNAS study finds persuasion techniques boost LLM compliance from 35% to 51%, with newer models resisting more.

May 19, 202685% relevant

Boston Dynamics Atlas Lifts 100-lb Fridge via RL

Boston Dynamics showed Atlas lifting a 100+ lb mini-fridge via RL, moving from locomotion to practical manipulation.

May 19, 202685% relevant

Gemini 3.5 Flash Generates Full Web OS in One Shot

Gemini 3.5 Flash generated a full web OS from one prompt in a single HTML file, showcasing one-shot generation of complex UI.

May 18, 202685% relevant

Runway Agent Mode Builds Stories From Short Text Prompts

Runway Agent mode builds complex stories from short text. One-shot attempt shows promise but no benchmarks.

May 15, 202678% relevant

Anthropic Shows Anyone With a Laptop Can Poison Any Major AI Model

Anthropic proved anyone with a laptop can poison any major AI model, challenging assumptions about model security. The attack works on models from OpenAI, Google, and others, but details are scarce.

May 10, 202677% relevant

Claude Code Thwarts 13M RPS DDoS Attack in 10 Minutes

Claude Code autonomously stopped a 13M RPS DDoS attack on BridgeMind in 10 minutes, demonstrating AI agent capability in live infrastructure threats.

May 7, 2026100% relevant

Explore More

AI Agents Large Language Models Claude Code OpenAI RAG MCP Fine-tuning Benchmarks Open Source AI AI Safety