tokens
30 articles about tokens in AI news
Median Coding Agent Hits 96k Input Tokens, Rewriting Inference Economics
SemiAnalysis found median coding agent uses 96k input tokens from 432k requests, shifting inference cost focus from output to context.
Glean benchmark: Off-the-shelf MCP costs 30% more tokens than indexed context
Glean benchmark: off-the-shelf MCP in Claude Cowork loses 2.5x more tasks and uses 30% more tokens than indexed context.
CLAUDE.md Wastes 7K+ Tokens Per Turn; Skills Cut to 50
A 1,000-line CLAUDE.md burns 7,000-10,000 tokens per turn on instructions the model already knows. Skills using progressive disclosure cut that to ~50 tokens.
ByteDance GenLIP: ViT Predicts Language Tokens Directly with 8B Samples
ByteDance's GenLIP trains ViTs to predict language tokens directly with a single autoregressive objective, outperforming baselines on 8B samples.
Talkie: Vintage LLM Trained on 260B Pre-1931 English Tokens
Talkie is a new 'vintage language model' trained on 260 billion tokens of historical English text from before 1931, developed by a team including Alec Radford, co-author of the original GPT paper. It offers a unique linguistic artifact for NLP research.
Doby Cuts Claude Code Navigation Tokens by 95% with Spec-First Workflow
A spec-first fix workflow that slashes navigation tokens 95% and enforces plan docs as source of truth before code changes.
OpenAI Engineer Processed 210B Tokens, Sparking AI Efficiency Debate
An OpenAI engineer processed 210 billion tokens in one week, equivalent to 33 Wikipedia-sized datasets. This extreme usage spotlights a growing trend where high AI consumption by engineers leads to a 10x cost increase and a high volume of discarded code.
Claude Mythos Preview Priced at $25/$125 Per Million Tokens
Anthropic's Claude Mythos model is available in private preview at $25 per million input tokens and $125 per million output tokens. This positions it as a premium but competitively priced option in the high-performance LLM market.
Google's Gemma 4B Model Runs on Nintendo Switch at 1.5 Tokens/Second
A developer successfully ran Google's 4-billion parameter Gemma language model on a Nintendo Switch, achieving 1.5 tokens/second inference. This demonstrates the increasing feasibility of running small LLMs on consumer-grade edge hardware.
Gemma 4 26B A4B Hits 45.7 tokens/sec Decode Speed on MacBook Air via MLX Community
A community benchmark shows the Gemma 4 26B A4B model running at 45.7 tokens/sec decode speed on a MacBook Air using the MLX framework. This highlights rapid progress in efficient local deployment of mid-size language models on consumer Apple Silicon.
Fireworks AI Launches 'Fire Pass' with Kimi K2.5 Turbo at 250 Tokens/Second
Fireworks AI has launched a new 'Fire Pass' subscription offering access to Kimi K2.5 Turbo at speeds up to 250 tokens/second. The service includes a free trial followed by a $7 weekly subscription.
Zhipu AI Announces GLM-5.1 Series, Featuring 1M Context and 128K Output Tokens
Zhipu AI has announced the GLM-5.1 model series, featuring a 1 million token context window and support for 128K output tokens. The update includes multiple model sizes and API availability.
MCP vs CLI: When to Skip MCP Servers and Save 37% on Tokens
Benchmarks show MCP servers can add 37% more input tokens vs. direct CLI commands. Learn when to use CLI for efficiency and when MCP's structure is worth the cost.
NVIDIA Spending ~$75K Per Engineer on AI Compute Tokens, Indicating Multi-Billion Dollar Annual Budget
NVIDIA is reportedly allocating approximately $75,000 in AI compute tokens per engineer annually, translating to a multi-billion dollar organization-wide budget for AI development resources.
Jensen Huang's AI Productivity Mandate: Engineers Must Spend 50% of Salary on AI Tokens
NVIDIA CEO Jensen Huang argues that a $500K engineer should spend at least $250K annually on AI inference tokens, framing token consumption as essential as CAD tools for chip design. He claims this investment eliminates perceptions of difficulty, time, and resource constraints in development.
PRISM Study: Mid-Training on 27B Tokens Boosts Math Scores by +15 to +40 Points, Enables Effective RL
A comprehensive study shows mid-training on 27B high-quality tokens consistently improves reasoning in LLMs. This 'retention-aware' phase restructures 90% of weights, creating a configuration where RL can succeed.
Sam Altman Aims for '5T Tokens Per Day' as OpenAI Reportedly Scales GPT-5.4
Sam Altman stated his goal is to flood the market with AI tokens, comparing intelligence to a utility. A separate, unverified report claims GPT-5.4 is processing '5T tokens per day' in its first week.
HyperTokens Break the Forgetting Cycle: A New Architecture for Continual Multimodal AI Learning
Researchers introduce HyperTokens, a transformer-based system that generates task-specific tokens on demand for continual video-language learning. This approach dramatically reduces catastrophic forgetting while maintaining fixed memory costs, enabling AI models to learn sequentially without losing previous knowledge.
Diffusion Architecture Breaks Speed Barrier: Inception's Mercury 2 Hits 1,000 Tokens/Second
Inception's Mercury 2 achieves unprecedented text generation speeds of 1,000 tokens per second using diffusion architecture borrowed from image AI. This represents a 10x speed advantage over leading models like Claude 4.5 Haiku and GPT-5 Mini without requiring custom hardware.
Anthropic Tightens Security: OAuth Tokens Banned from Third-Party Tools in Major Policy Shift
Anthropic has implemented a significant security policy change, prohibiting the use of OAuth tokens and its Agent SDK in third-party tools. This move comes amid growing enterprise adoption and heightened security concerns in the AI industry.
Claude Code's Secret Weapon: How the /btw Command Saves Tokens and Keeps You in Flow
Use the /btw command to ask quick, contextual questions without resetting your main task's conversation, saving tokens and preventing workflow interruptions.
Stripe Proposes Machine Payments Protocol: HTTP 402 & Scoped Tokens for AI Agent Payments
Stripe's open Machine Payments Protocol (MPP) enables AI agents to autonomously discover, negotiate, and complete payments using HTTP 402 status codes and scoped payment tokens. It supports both fiat and crypto rails, eliminating the need for human-in-the-loop payment flows.
DeepSeek v4 Pricing Cuts 75%: $0.43/M Tokens In
DeepSeek v4 API pricing permanently cut 75% to $0.43/M input, $0.87/M output, enabled by 27% compute and 10% cache vs v3.2.
How Andre Karpathy's CLAUDE.md Guidelines Save Millions of Tokens — and
Andre Karpathy's CLAUDE.md patterns cut token waste by 40%+. Copy his exact config to slash costs and speed up Claude Code.
MIT's RLM Handles 10M+ Tokens, Outperforms RAG on Long-Context Benchmarks
MIT researchers introduced Recursive Language Models (RLMs), which treat long documents as an external environment and use code to search, slice, and filter data, achieving 58.00 on a hard long-context benchmark versus 0.04 for standard models.
Install token-ninja: The MCP Server That Saves Tokens on Common Shell Commands
A new MCP server, token-ninja, automatically runs simple shell commands locally instead of sending them to Claude, cutting token usage and speeding up your workflow.
Codeburn: The TUI That Shows Exactly Where Your Claude Code Tokens Are Going
A new open-source TUI, Codeburn, analyzes Claude Code session transcripts to show token spend by task type, helping developers optimize their usage and costs.
Scaling Law Plateau Not Universal: More Tokens Boost Reasoning AI Performance
Empirical evidence indicates the 'second scaling law'—performance gains from increased computation—does not fully plateau for many reasoning tasks. Benchmark results may be artificially limited by token budgets, not model capability.
CLAUDE.md Promises 63% Reduction in Claude Output Tokens with Drop-in Prompt File
A new prompt engineering file called CLAUDE.md claims to reduce Claude's output token usage by 63% without code changes. The drop-in file aims to make Claude's code generation more efficient by structuring its responses.
Stop Claude Code's Web Fetches from Burning 700K Tokens on HTML Junk
A new MCP server, token-enhancer, strips scripts, nav bars, and ads from web pages before they hit Claude's context, cutting token waste by 90%+.