updates
30 articles about updates in AI news
Anthropic Launches @ClaudeDevs X Account for API Developer Updates
Anthropic has launched @ClaudeDevs on X, a new channel for developers to receive direct updates on API releases, changelogs, and community news. This formalizes a direct line of communication for its growing developer ecosystem.
How to Decode Anthropic's Press Releases for Better Claude Code Updates
Claude Code users should learn to filter Anthropic's technical announcements for actionable updates on model capabilities, context windows, and API pricing that affect daily development.
Memento-Skills Agent System Achieves 116.2% Relative Improvement on Humanity's Last Exam Without LLM Updates
Memento-Skills is a generalist agent system that autonomously constructs and adapts task-specific agents through experience. It enables continual learning without updating LLM parameters, achieving 26.2% and 116.2% relative improvements on GAIA and Humanity's Last Exam benchmarks.
New Research Diagnoses LLMs' Struggle with Multiple Knowledge Updates in Context
A new arXiv paper reveals a persistent bias in LLMs when facts are updated multiple times within a long context. Models increasingly favor the earliest version, failing to track the latest state—a critical flaw for dynamic knowledge tasks.
Almanac: Open-Source Wiki Auto-Updates From Claude Code Chats
Almanac auto-generates a markdown wiki from Claude Code chats and repo history, solving the agent context gap. Free open-source tool, MacOS-only.
ReCast: A New RL Technique That Fixes Sparse-Hit Learning in Generative
Researchers propose ReCast, a 'repair-then-contrast' framework that fixes a fundamental flaw in group-based RL for generative recommendation: many sampled groups never become learnable. ReCast restores learnability for zero-reward groups and replaces normalization with contrastive updates, achieving up to 36.6% improvement in Pass@1 and 16.6x faster actor updates.
Ethan Mollick Proposes AI Model 'Changelog' for Task-Level Performance Tracking
AI researcher Ethan Mollick argues labs should release a 'changelog' alongside model cards, detailing performance changes on individual tasks. This would increase transparency as model updates become more frequent.
ID Privacy Launches 'Self-Healing' AI Graph for Automotive Retail
ID Privacy has launched the Self-Healing Agentic Intelligence Graph, an AI platform for automotive retail that automatically updates customer profiles and handles dealer communications. This represents a move towards more autonomous, context-aware AI agents in a high-value retail sector.
Meta Halts Mercor Work After Supply Chain Breach Exposes AI Training Secrets
A supply chain attack via compromised software updates at data-labeling vendor Mercor has forced Meta to pause collaboration, risking exposure of core AI training pipelines and quality metrics used by top labs.
DACT: A New Framework for Drift-Aware Continual Tokenization in Generative Recommender Systems
Researchers propose DACT, a framework to adapt generative recommender systems to evolving user behavior and new items without costly full retraining. It identifies 'drifting' items and selectively updates token sequences, balancing stability with plasticity. This addresses a core operational challenge for real-world, dynamic recommendation engines.
MiniMax M2.7 AI Agent Rewrites Its Own Harness, Achieving 9 Gold Medals on MLE Bench Lite Without Retraining
MiniMax's M2.7 agent autonomously rewrites its own operational harness—skills, memory, and workflow rules—through a self-optimization loop. After 100+ internal rounds, it earned 9 gold medals on OpenAI's MLE Bench Lite without weight updates.
Trace2Skill Framework Distills Execution Traces into Declarative Skills via Parallel Sub-Agents
Researchers introduced Trace2Skill, a framework that uses parallel sub-agents to analyze execution trajectories and distill them into transferable declarative skills. This enables performance improvements in larger models without parameter updates.
Secure Your MCP Servers: ClawGuard Scans for Tool Poisoning and Rug Pulls
New security tool ClawGuard scans MCP servers for hidden instructions in tool descriptions, parameter exploits, and malicious updates—critical for Claude Code users connecting to external tools.
Momentum-Consistency Fine-Tuning (MCFT) Achieves 3.30% Gain in 5-Shot 3D Vision Tasks Without Adapters
Researchers propose MCFT, an adapter-free fine-tuning method for 3D point cloud models that selectively updates encoder parameters with momentum constraints. It outperforms prior methods by 3.30% in 5-shot settings and maintains original inference latency.
Google Advances Agentic Shopping with UCP as OpenAI Retreats from Instant Checkout
Google is expanding its Universal Commerce Protocol (UCP) for AI shopping agents, adding multi-item cart creation, real-time catalog updates, and identity linking. This comes as OpenAI pulls back from its ChatGPT Instant Checkout feature, signaling a strategic pivot in the AI commerce landscape.
MetaClaw: Personal AI Agent That Meta-Learns from Conversations Using Cloud LoRA and Skill Synthesis
MetaClaw is a personal AI agent that automatically evolves from every conversation. It meta-learns in the wild using cloud LoRA and skill synthesis, scheduling weight updates during idle time with zero downtime.
Blue Yonder Expands Agentic AI and Mobile Apps for Retail Supply Chain Execution
Blue Yonder announced new agentic AI capabilities and mobile companion apps for retail planning and execution. The updates target merchandise financial planning, assortment optimization, and mobile allocation workflows to improve decision speed and accuracy.
OpenAI's GPT-5.4: The Million-Token Context Window That Changes Everything
OpenAI's upcoming GPT-5.4 will feature a groundbreaking 1 million token context window, matching competitors like Gemini and Claude. The model introduces an 'Extreme reasoning mode' for complex tasks and represents a shift toward monthly updates.
AI-Native CRM Revolution: How Lightfield Automates Sales Workflows Beyond Traditional Systems
Lightfield introduces an AI-native CRM that automatically updates customer data by connecting to email, calendar, and meetings, eliminating manual upkeep and transforming how sales teams manage relationships.
Grok's Weekly Evolution: How xAI's Rapid Iteration Model Could Redefine AI Development
xAI's Grok AI assistant is implementing a weekly improvement cycle, promising 'recursive intelligence growth' through continuous updates. This rapid iteration approach could accelerate AI capabilities beyond traditional development models.
Tencent's Training-Free GRPO: A Paradigm Shift in AI Alignment Without Fine-Tuning
Tencent researchers have introduced Training-Free GRPO, a method that achieves reinforcement learning-level alignment results for just $18 instead of $10,000—with zero parameter updates. This breakthrough could fundamentally change how we optimize language models.
AgentPulse: The Open-Source Dashboard That Solves Claude Code's
Install AgentPulse to gain visibility into all your active Claude Code and Codex CLI sessions from a single web dashboard, with live updates, session naming, and prompt history.
Blue Yonder Expands Agentic AI and Mobile Apps for Supply Chain Execution
Supply chain software leader Blue Yonder announced new AI agents and mobile applications for retail planning and execution. The updates target merchandise financial planning, assortment optimization, and mobile allocation tasks to help teams make faster, smarter decisions.
Show HN: Spec-Driven Dev Workflow Cuts Claude Code Agent Confusion
SDDW introduces a spec-driven workflow for Claude Code that decomposes complex tasks into specs and subtasks, clearing context between steps to reduce agent confusion and costs.
AI Model Runs Entirely on USB Stick, No Cloud Needed
An unnamed developer built an AI on a USB stick, no internet needed. Challenges ChatGPT's cloud model.
Claude Code Digest — May 14–May 17
Cut CLAUDE.md token waste by 99.3% with progressive disclosure skills.
Anthropic Ships Claude Opus 4.7: 80.1 SWE-Bench, 1M Context
Anthropic released Claude Opus 4.7 on April 16, 2026, scoring 80.1 on SWE-Bench Verified, a slight regression from Opus 4.6's 80.3. The release prioritizes safety tuning over benchmark leadership.
Hybrid A*+RL Agent Beats Pure End-to-End in Unity SR-71 Sim
A hybrid A* + deep RL agent in Unity, trained over 5M PPO steps, switches between classical path planning and learned evasion to navigate an SR-71 through a maze while dodging missiles.
Conductor vs Claude Code: Pinned Versions Split the Community
Ask HN asks if Conductor's single-agent matches native Claude Code. Pinned versions create a stability-vs-latency trade-off.
Federated Rec System Beats Centralized CTR in 53-Day User Study
A 53-day federated recommender study with 22 users showed user-controlled personalization achieving 65.37% CTR, challenging the privacy-utility tradeoff assumption.