Hugging Face Daily Papers highlighted eight AI papers this week, including Orca's world model and Dockerless's container-free coding. The weekly roundup, posted by @HuggingPapers, spans world models, agentic abstention, and code verification.
Key facts
- Dockerless scores 62% on SWE-bench Verified.
- 35B agent matches trillion-parameter performance.
- LiveEdit runs at 12.66 FPS for video editing.
- Program-as-Weights uses 50x less memory.
- BlockPilot achieves 4.2x speedups.
This week's Hugging Face Daily Papers roundup features eight papers spanning world models, agentic abstention, and container-free code verification. The collection, curated by the Hugging Face community, highlights advances in scaling, efficiency, and real-time editing.
World Models and Agentic Abstention
Orca, a general world foundation model, uses Next-State-Prediction to unify text, image, and embodied action generation [per @HuggingPapers]. This approach contrasts with traditional next-token prediction by modeling the world state directly. Separately, Agentic Abstention proposes a method for LLM agents to know when to stop acting, improving timely abstention without any fine-tuning [according to the paper]. The technique addresses a critical failure mode in autonomous agents: over-acting when uncertain.
Code Without Containers
Dockerless introduces an environment-free program verifier for coding agents, scoring 62% on SWE-bench Verified without containers [per the paper]. This matches or exceeds many container-dependent methods, reducing infrastructure overhead. The approach uses static analysis and symbolic execution to verify code correctness without runtime environments.
Scaling and Efficiency
Scaling the Horizon, Not the Parameters demonstrates how a 35B agent reaches trillion-parameter performance through long-horizon scaling [according to @HuggingPapers]. This suggests that scaling inference time, rather than model size, can be a more compute-efficient path to capability. LiveEdit achieves real-time diffusion-based streaming video editing at 12.66 FPS for interactive AR applications [per the paper]. Program-as-Weights compiles fuzzy functions into tiny neural artifacts that match 32B model quality with 50x less memory [according to the paper]. BlockPilot achieves 4.2x speedups through instance-adaptive speculative decoding for diffusion models [per the paper].
Knowledge and Representation
DOPD (Dual On-policy Distillation) fixes privilege illusion during student-teacher knowledge transfer [per the paper]. Does VLA Even Know the Basics? measures how much commonsense and world knowledge VLMs lose when becoming embodied agents [according to @HuggingPapers]. Formalizing Latent Thoughts proposes four axioms revealing that LLM latent representations may encode far less reasoning than we assume [per the paper].
What to watch
Watch for follow-up papers on Dockerless's scalability to larger codebases and Orca's integration into embodied robotics benchmarks. The long-horizon scaling result could spur more research into inference-time compute versus parameter scaling.









