A robotic arm in a lab setting grasps a transparent plastic cup on a table, with a monitor displaying code and…

DART: One-Shot Robot Adaptation via Weight Space Arithmetic

DART from Seoul National University adapts robot policies with one demonstration using weight space arithmetic, achieving 73% success on unseen domain shifts.

x.com/13h ago/3 min read

domain adaptationweight space arithmeticrobot learning

A diagram showing ELDR routing decode requests to GPU experts using prefill signatures, with arrows indicating…

AI Research

85

ELDR: Expert-Locality Decode Routing Cuts MoE TPOT by 13.9%

ELDR uses prefill expert signatures to route decode requests, cutting median TPOT by 5.9–13.9% in vLLM at scale.

x.com/21h ago/3 min read

ai infrastructurellm servingmoe

3D scene decomposition showing token groups representing distinct objects, with reconstruction and segmentation…

AI Research

87

Feed-Forward Model Decomposes 3D Scenes as Objects Without 3D Labels

A feed-forward model decomposes 3D scenes into objects from unposed images without 3D annotations, enabling one-pass reconstruction, segmentation, and manipulation.

x.com/1d ago/3 min read

computer visionscene understanding3d vision

Seven LLM logos arranged in a grid above a bar chart comparing posterior match rates and downstream prediction…

AI Research

87

BayesBench: LLMs Match Bayesian Posteriors But Fail Downstream Prediction

BayesBench tests 7 LLMs on multi-turn Bayesian reasoning. Scaling improves latent inference but not prediction, exposing a critical gap for agentic deployment.

arxiv.org/2d ago/3 min read/Multi-Source

llm evaluationbayesian inferenceai research

A colorful heatmap overlay on a digital brain scan highlights distinct neural regions labeled for language, math…

AI Research

85

LLMs Spontaneously Develop Human-Like Brain Regions for Language, Math

LLMs spontaneously develop human-like brain regions for language, math, physics, and social reasoning, per @LiorOnAI. Two optimization processes converged on the same solution.

x.com/2d ago/3 min read

emergent behaviorneuroscienceai research

Server racks with domestic chips powering a 1.6 trillion-parameter AI model, LongCat-2.0, open-sourced by Meituan in…

AI ResearchBreakthrough

100

Meituan Open-Sources 1.6T-Parameter LongCat-2.0 Trained on Domestic Chips

Meituan open-sourced 1.6T-parameter LongCat-2.0 trained on 50,000 domestic ASICs, claiming China's first full-process domestic-chip trillion-parameter model.

scmp.com/3d ago/3 min read/Widely Reported

chinaopen-sourcehardware

A diagram showing SingGuard processing text and image inputs through fast and slow reasoning modules to evaluate…

AI Research

85

SingGuard: Runtime Guardrails for Multimodal AI Treat Safety as Input

SingGuard treats safety rules as runtime inputs for multimodal AI, achieving SOTA across 6 families and 35 datasets via fast/slow reasoning.

x.com/3d ago/3 min read

guardrailsai safetymultimodal ai

A diagram showing multiple hash IDs replacing traditional token embeddings in a Transformer architecture, with…

AI Research

85

MultiHashFormer Brings Hash-Based Autoregression to Causal LMs

MultiHashFormer brings hash-based autoregression to causal LMs, slashing embedding memory and outperforming standard Transformers from 100M to 3B parameters.

x.com/3d ago/3 min read

efficiencylanguage modelsai research

Open textbook on mathematical foundations of reinforcement learning with grid-world examples, 16.2K GitHub stars…

AI Research

81

Free RL Textbook 'Math Foundations' Hits 16.2K GitHub Stars

Free RL textbook by Shiyu Zhao hits 16.2K GitHub stars and 2.1M video views, filling a gap in RL education with rigorous math and a unified grid-world example.

x.com/4d ago/3 min read

open-sourcereinforcement-learningmachine-learning

A human hand in a blue glove demonstrates a task while a robot arm mirrors the motion, with a green overlay showing…

AI Research

82

ByteDance Seed Turns Cheap Human Videos Into Robot Skills

ByteDance Seed replaces noisy 6DoF hand poses with relative wrist translation, creating a shared action space for humans and bi-manual robots that scales with cheap data and outperforms full-pose baselines.

x.com/4d ago/3 min read

roboticsbytedanceimitation learning

Two computer monitors side by side display lines of code in a dimly lit office, suggesting a software bug-hunting or…

AI ResearchBreakthrough

83

Zhipu GLM-5.2 beats Anthropic's Mythos on bug-hunt benchmark

Zhipu AI's GLM-5.2 beat Anthropic's Claude Opus 4.8 on a cybersecurity bug-hunting benchmark, then matched it with extra instructions, marking another 'DeepSeek moment'.

scmp.com/4d ago/3 min read/Multi-Source

anthropiczhipu aichinese ai

Bar chart showing GPT-5.4 performance on PlanBench-XL dropping from 51.90% to 11.36% on hardest tool-use tasks with…

AI Research

90

PlanBench-XL: GPT-5.4 Scores 11.36% on Hard Tool-Use Tasks

PlanBench-XL shows GPT-5.4 drops from 51.90% to 11.36% accuracy on long-horizon tool-use tasks with 1,665 tools, revealing a fundamental planning weakness.

x.com/4d ago/3 min read

planningbenchmarksllm-agents

Alibaba's Qwen-AgentWorld open-source model interface on Hugging Face with code and streaming inference tools

AI Research

82

Alibaba Open-Sources Qwen-AgentWorld for Generalist Agent Training

Alibaba open-sourced Qwen-AgentWorld and Wan-Streamer v0.1 on Hugging Face, targeting generalist agent training and real-time streaming. The releases include 8 additional papers on agent benchmarks and architectures.

x.com/4d ago/3 min read

open-sourceagentic aiworld models

A large neural network diagram overlays molecular structures, protein chains, and text tokens, illustrating…

AI Research

95

BioMatrix: A single decoder reads proteins, molecules, language on 304B tokens

BioMatrix, a decoder-only biological foundation model, achieves SOTA on 77 of 80 tasks after training on 304B tokens of sequences, structures, and language.

x.com/5d ago/3 min read

foundation modelsprotein designmolecular generation

Two stacked line charts compare attention cost and prefill speed between standard and Grouped Query Experts methods…

AI Research

85

Grouped Query Experts cuts long-context attention cost 44%

GQE speeds long-context attention prefill 1.7–1.8× by routing tokens to 9 of 16 query heads, matching baseline accuracy at 56.04.

x.com/5d ago/3 min read

efficiencyresearchattention

A diagram shows EvoEmbedding's latent memory queue processing a long text passage, generating dynamic embeddings…

AI Research

85

EvoEmbedding Beats Static Embedders 3× Larger via Latent Memory Queue

EvoEmbedding uses a latent memory queue to beat static embedders 3× its size on long-context retrieval, per @HuggingPapers.

x.com/5d ago/3 min read

embedding modelsresearchretrieval

A terminal window displays command-line output with benchmark results, showing a 33.4% score, while a bar chart…

AI Research

87

CLI-Universe: Qwen3-32B fine-tuned on 6K trajectories beats models 10x larger on Terminal-Bench 2.0

CLI-Universe synthesizes terminal-agent tasks; Qwen3-32B fine-tuned on 6K trajectories hits 33.4% on Terminal-Bench 2.0, beating models 10x larger.

x.com/5d ago/3 min read

agentic aifine-tuningbenchmarks

Robot with a new limb configuration adapting its movement on a lab floor, surrounded by sensors and a computer…

AI Research

85

ICWM Lets Robots Adapt to Unseen Morphologies in Seconds

ICWM learns world dynamics from seconds of self-generated interaction, enabling zero-shot generalization to unseen cameras and morphologies without fine-tuning.

x.com/6d ago/3 min read

roboticsresearchai

Two researchers point at a large monitor displaying a chart comparing iLLaDA and Qwen2.5 benchmark scores, with the…

AI ResearchBreakthrough

93

ByteDance iLLaDA: 8B Diffusion LM Matches Qwen2.5 Base, Lags on Instruct

ByteDance iLLaDA, an 8B diffusion LM trained on 12T tokens, matches Qwen2.5 7B on base benchmarks (63.9 vs 63.3) but trails 10 points after instruction tuning, revealing the alignment gap for diffusion models.

the-decoder.com/6d ago/3 min read/Multi-Source

llm benchmarksdiffusion modelsbytedance

A taxonomy diagram with branching nodes connecting world models, video generation, and vision-language-action…

AI Research

87

World Action Models Survey Unifies 100+ Methods Under One Taxonomy

A survey reviews 100+ world action models, unifying world models, video generation, and VLA policies under one taxonomy.

x.com/6d ago/3 min read

world modelssurveyembodied ai

Agent traces overlaid on a grid map, with numbered steps showing hierarchical skill decomposition from hindsight, no…

AI Research

82

OPID: Agents Learn From Hindsight Without External Memory

OPID lets agents learn hierarchical skills from hindsight, improving sample efficiency on ALFWorld, WebShop, Search QA without external memory at inference.

x.com/6d ago/3 min read

hierarchical-reinforcement-learningagent-learningsample-efficiency

AI Research

85

NVIDIA Drops Fast-FoundationStereo: 10× Faster Depth Estimation

NVIDIA released Fast-FoundationStereo, a real-time foundation model for zero-shot stereo depth estimation that is 10× faster than FoundationStereo with matching accuracy.

x.com/6d ago/3 min read

roboticscomputer visionedge ai

Alibaba Qwen-Image-Agent agentic framework diagram showing context-aware image generation pipeline with planning…

AI Research

87

Qwen-Image-Agent: Alibaba's Agentic Framework for Context-Aware Image Gen

Alibaba's Qwen-Image-Agent uses planning, reasoning, search, and memory to build context for text-to-image models, bridging the context gap in real-world generation.

x.com/Jun 26, 2026/3 min read

image generationagentic aialibaba

A graph network diagram with red nodes and lines representing dynamic red-teaming connections overlaid on a digital…

AI Research

85

RIFT-Bench Tests 45 Agentic Systems With Dynamic Red-Teaming

RIFT-Bench evaluates 45 agentic AI systems via a graph-driven two-phase pipeline, enabling unified security comparison across heterogeneous architectures.

arxiv.org/Jun 24, 2026/3 min read/Widely Reported

ai safetyagentic aibenchmarks

ReMMD Agent Hits 41.8% Accuracy on Multilingual Misinformation, Cuts Cost 79.9%

AI Research

72

ReMMD Agent Hits 41.8% Accuracy on Multilingual Misinformation, Cuts Cost 79.9%

ReMMD-Agent achieves 41.8% accuracy on multilingual misinformation detection with 79.9% cost reduction, using a persistent memory approach.

arxiv.org/Jun 24, 2026/3 min read

agentic systemsmisinformationai research

A screenshot of a Twitter post by Tencent announcing the open-source release of TencentDB Agent Memory, showing…

AI Research

100

Tencent Open-Sources Agent Memory System Cutting Token Use 61%

Tencent open-sourced TencentDB Agent Memory, cutting token usage by 61.38% and boosting task success by 51.52% on WideSearch, running fully local.

x.com/Jun 23, 2026/3 min read/Multi-Source

open-sourcememory-systemsai-agents

Two bar charts comparing OpenAI GPT-5.5-Cyber and Anthropic Mythos scores on CyberGym, ExploitGym, and SEC-bench…

AI ResearchBreakthrough

100

OpenAI GPT-5.5-Cyber Beats Anthropic Mythos on Security Benchmarks

OpenAI's GPT-5.5-Cyber beats Anthropic's Mythos on security benchmarks. Updated Codex plugin auto-patches after scanning 30M commits.

the-decoder.com/Jun 23, 2026/3 min read/Widely Reported

anthropicbenchmarksai models

A person in a dark sweater and glasses looks at a smartphone, their face lit by the screen, with a small robot…

AI Research

75

Pew: Only 16% of Americans Expect AI to Help Society in 2026

Pew report: 16% of Americans expect AI to help society, down from 37% in 2024 — a 21-point drop in two years.

x.com/Jun 23, 2026/3 min read

industry trendsai safetyai policy

A dynamic dashboard with interconnected nodes representing multiple LLMs, coordinated by Sakana AI's Fugu…

AI Research

85

Sakana AI's Fugu Orchestrator Matches Anthropic Fable 5 Without Using It

Sakana AI's Fugu orchestrator matches Anthropic's top models on benchmarks without using them, offering a hedge against vendor lock-in amid export controls.

the-decoder.com/Jun 22, 2026/3 min read/Widely Reported

startupsbenchmarksai models

A 3D spatial tree diagram with branching nodes and arrows illustrating hierarchical spatial reasoning, with…

AI ResearchBreakthrough

100

ByteDance Seed's SpatialTree Redefines MLLM Spatial Reasoning at CVPR 2026

ByteDance Seed's SpatialTree achieves 79.8% on SEAL-Bench, 12.4 points above GPT-4V, using hierarchical spatial decomposition. Open-sourced at CVPR 2026.

pandaily.com/Jun 22, 2026/3 min read/Widely Reported

bytedancecomputer visionai research