model update

30 articles about model update in AI news

Ethan Mollick Proposes AI Model 'Changelog' for Task-Level Performance Tracking

AI researcher Ethan Mollick argues labs should release a 'changelog' alongside model cards, detailing performance changes on individual tasks. This would increase transparency as model updates become more frequent.

Apr 17, 202685% relevant

Kimi 2.6 Code Model Teased in Leaked Image, Suggesting Moonshot AI Update

A screenshot circulating online appears to show a 'Kimi 2.6' code model interface, suggesting Moonshot AI is preparing an update to its Kimi Chat platform focused on coding tasks.

Apr 13, 202685% relevant

239-Paper Survey Maps How AI Agents Self-Improve via Scaffold Updates

A survey of 239 papers shows 68% of AI agent self-improvement methods focus on scaffold updates rather than model retraining, raising evaluation quality concerns.

Jul 19, 202685% relevant

How to Decode Anthropic's Press Releases for Better Claude Code Updates

Claude Code users should learn to filter Anthropic's technical announcements for actionable updates on model capabilities, context windows, and API pricing that affect daily development.

Apr 8, 202697% relevant

TensorFlow Playground Interactive Demo Updated for 2026, Enabling Real-Time Neural Network Visualization

The TensorFlow Playground, an educational web tool for visualizing neural networks, has been updated. Users can now adjust hyperparameters and watch the model train and visualize decision boundaries in real-time.

Mar 31, 202685% relevant

New Research Diagnoses LLMs' Struggle with Multiple Knowledge Updates in Context

A new arXiv paper reveals a persistent bias in LLMs when facts are updated multiple times within a long context. Models increasingly favor the earliest version, failing to track the latest state—a critical flaw for dynamic knowledge tasks.

Mar 16, 202678% relevant

Claude 3.5 Sonnet's Latest Update Redefines AI Agent Capabilities for Real-World Tasks

Anthropic's Claude 3.5 Sonnet 4.6 update demonstrates remarkable improvements in agentic workflows and computer interaction, positioning it as a leading model for practical AI applications. Early adopters report unprecedented efficiency in real-world task automation.

Feb 17, 202685% relevant

Codex Update Cuts GUI Workflow Latency 42%

Codex app update cuts GUI workflow latency 42%, enabling near-human-speed interface operation for autonomous app building and debugging.

May 1, 202684% relevant

OpenAI Codex Update Adds macOS Agent, Browser, Memory; 3M Weekly Users

OpenAI released a major Codex update featuring background macOS automation, an in-app browser, persistent memory, and 90+ plugins. With 3M weekly users and nearly half of usage now non-coding, Codex is being repositioned as a general work agent.

Apr 16, 2026100% relevant

Anthropic Launches @ClaudeDevs X Account for API Developer Updates

Anthropic has launched @ClaudeDevs on X, a new channel for developers to receive direct updates on API releases, changelogs, and community news. This formalizes a direct line of communication for its growing developer ecosystem.

Apr 16, 202675% relevant

Rumor: Anthropic's Next Claude Update May Include AI App Builder

A rumor on X claims the next Claude update will include an app builder, allowing users to create applications through conversational AI. This could significantly lower the barrier to app development.

Apr 13, 202687% relevant

Google's Cookie Policy Update and the Challenge of AI-Powered Personalization

Google has updated its user-facing cookie and data consent interface, emphasizing its use of data for personalization and ad measurement. This reflects the ongoing tension between data-driven AI services and user privacy, a critical issue for luxury retail's digital transformation.

Apr 2, 202682% relevant

Alibaba's Qwen Team Teases Qwen 3.6 Model, Signaling Major Open-Source LLM Update

Alibaba's Qwen team has teased the imminent release of Qwen 3.6, the next major version of its open-source large language model series. This follows the release of Qwen 2.5 in late 2024 and signals continued aggressive competition in the open-weight model space.

Mar 30, 202685% relevant

Almanac: Open-Source Wiki Auto-Updates From Claude Code Chats

Almanac auto-generates a markdown wiki from Claude Code chats and repo history, solving the agent context gap. Free open-source tool, MacOS-only.

May 14, 202690% relevant

Newline's 'Skills' Update Shows Where MCP Servers Are Headed

The Newline MCP server now supports modular 'Skills,' allowing developers to customize their Claude Code environment with specific, installable capabilities for more targeted workflows.

Apr 13, 2026100% relevant

AI Shopping Update: OpenAI Focuses on Discovery, Meta Launches Checkout & Shopify Offers Catalog Integration

A trio of major AI shopping announcements: OpenAI shifts focus to product discovery, Meta launches in-app checkout for AI shopping ads, and Shopify opens its catalog integration to any brand. This signals a rapid move from conversational AI to transactional agentic systems.

Mar 25, 202695% relevant

Memento-Skills Agent System Achieves 116.2% Relative Improvement on Humanity's Last Exam Without LLM Updates

Memento-Skills is a generalist agent system that autonomously constructs and adapts task-specific agents through experience. It enables continual learning without updating LLM parameters, achieving 26.2% and 116.2% relative improvements on GAIA and Humanity's Last Exam benchmarks.

Mar 22, 202685% relevant

Stealth 100B Model Appears on OpenRouter, Possibly DeepSeek or Kimi

A new, unannounced 100-billion-parameter AI model has appeared on the OpenRouter API platform. Its origin is unknown, but observers speculate it could be a variant from DeepSeek or an update to Kimi's code model.

Apr 13, 202685% relevant

Claude Code Users: How to Check Status and Switch Models During Sonnet 4.6 Outages

A status update shows Sonnet 4.6 errors; developers should bookmark the status dashboard and know how to switch Claude Code models during outages.

Apr 8, 202678% relevant

AI Forecasters Revise AGI Timeline: Key Milestones Pulled Forward to 2029-2030 After Recent Model Progress

A significant update from AI forecasters indicates key AGI milestones have been pulled forward, with the median prediction for AGI arrival shifting from 2032 to 2029-2030. This revision follows rapid progress in recent model capabilities, particularly in reasoning and tool use.

Apr 4, 202685% relevant

AI Learns Like Humans: New System Trains Language Models Through Everyday Conversations

Researchers have developed a breakthrough system that enables language models to learn continuously from everyday conversations rather than static datasets. This approach mimics human learning patterns and could revolutionize how AI systems acquire and update knowledge.

Mar 14, 202685% relevant

Grok's Weekly Evolution: How xAI's Rapid Iteration Model Could Redefine AI Development

xAI's Grok AI assistant is implementing a weekly improvement cycle, promising 'recursive intelligence growth' through continuous updates. This rapid iteration approach could accelerate AI capabilities beyond traditional development models.

Feb 17, 202685% relevant

Agent Publish Primitives: Why Default-Private MCP Tools Beat Raw CDN URLs

Thryvate argues AI agents need five design properties for safe web publishing: default-private, revocable, expiring, per-viewer analytics, and idempotent updates. MCP tools enforce policy while the model handles intent.

Jun 27, 202675% relevant

AMD's Lemonade v10.8 Adds MCP Support, Letting Claude Desktop and Cursor Route Tasks to Local AMD GPUs

AMD-backed Lemonade v10.8, released June 17, now exposes a Model Context Protocol server, letting Claude Desktop, Cursor, and GitHub Copilot route inference tasks to local AMD Ryzen AI NPUs, Radeon GPUs, or plain CPUs — no cloud API required. The update also adds Moonshine speech-to-text, expanded R

Jun 17, 202670% relevant

OpenAI Launches GPT-Rosalind for Drug Discovery, GPT-5.4-Cyber for Security

OpenAI launched GPT-Rosalind, a life sciences model performing above the 95th percentile of human experts on novel biological data, and GPT-5.4-Cyber, a cybersecurity variant. These releases, alongside a major Agents SDK update, signal a pivot from general AI to specialized, high-stakes enterprise domains.

Apr 20, 202690% relevant

GPT-5.5 Limited Rollout Begins, Frontend Improvements Noted

OpenAI has started a limited rollout of GPT-5.5 to select users, with early reports highlighting significant frontend quality improvements. This suggests an incremental update focused on user experience rather than core model capabilities.

Apr 19, 202685% relevant

Claude 4.6 Migration Deadline

Anthropic is retiring Opus 4 and Sonnet 4 on June 15, 2026. Migrate to 4.6 models now to gain 1M context and higher output limits, but update your code for adaptive thinking and output_config changes.

Apr 15, 2026100% relevant

Claude Opus 4.7 Appears on Anthropic's Internal API, Hinting at Imminent Release

A new model identifier, 'Claude Opus 4.7', has been spotted on Anthropic's internal API. This suggests a forthcoming update to the flagship Opus line, potentially a minor version bump ahead of a larger release.

Apr 12, 202691% relevant

MLX-LM v0.9.0 Adds Better Batching, Supports Gemma 4 on Apple Silicon

Apple's MLX-LM framework released version 0.9.0 with enhanced server batching and support for Google's Gemma 4 model, improving local LLM inference efficiency on Apple Silicon. This update addresses a key performance bottleneck for developers running models locally on Mac hardware.

Apr 7, 202675% relevant

Trace2Skill Framework Distills Execution Traces into Declarative Skills via Parallel Sub-Agents

Researchers introduced Trace2Skill, a framework that uses parallel sub-agents to analyze execution trajectories and distill them into transferable declarative skills. This enables performance improvements in larger models without parameter updates.

Mar 30, 202685% relevant

Explore More

AI Agents Large Language Models Claude Code OpenAI RAG MCP Fine-tuning Benchmarks Open Source AI AI Safety