Skip to content
gentic.news — AI News Intelligence Platform
Connecting to the Living Graph…

tokens

30 articles about tokens in AI news

Median Coding Agent Hits 96k Input Tokens, Rewriting Inference Economics

SemiAnalysis found median coding agent uses 96k input tokens from 432k requests, shifting inference cost focus from output to context.

95% relevant

Glean benchmark: Off-the-shelf MCP costs 30% more tokens than indexed context

Glean benchmark: off-the-shelf MCP in Claude Cowork loses 2.5x more tasks and uses 30% more tokens than indexed context.

88% relevant

CLAUDE.md Wastes 7K+ Tokens Per Turn; Skills Cut to 50

A 1,000-line CLAUDE.md burns 7,000-10,000 tokens per turn on instructions the model already knows. Skills using progressive disclosure cut that to ~50 tokens.

100% relevant

ByteDance GenLIP: ViT Predicts Language Tokens Directly with 8B Samples

ByteDance's GenLIP trains ViTs to predict language tokens directly with a single autoregressive objective, outperforming baselines on 8B samples.

85% relevant

Talkie: Vintage LLM Trained on 260B Pre-1931 English Tokens

Talkie is a new 'vintage language model' trained on 260 billion tokens of historical English text from before 1931, developed by a team including Alec Radford, co-author of the original GPT paper. It offers a unique linguistic artifact for NLP research.

85% relevant

Doby Cuts Claude Code Navigation Tokens by 95% with Spec-First Workflow

A spec-first fix workflow that slashes navigation tokens 95% and enforces plan docs as source of truth before code changes.

100% relevant

OpenAI Engineer Processed 210B Tokens, Sparking AI Efficiency Debate

An OpenAI engineer processed 210 billion tokens in one week, equivalent to 33 Wikipedia-sized datasets. This extreme usage spotlights a growing trend where high AI consumption by engineers leads to a 10x cost increase and a high volume of discarded code.

85% relevant

Claude Mythos Preview Priced at $25/$125 Per Million Tokens

Anthropic's Claude Mythos model is available in private preview at $25 per million input tokens and $125 per million output tokens. This positions it as a premium but competitively priced option in the high-performance LLM market.

97% relevant

Google's Gemma 4B Model Runs on Nintendo Switch at 1.5 Tokens/Second

A developer successfully ran Google's 4-billion parameter Gemma language model on a Nintendo Switch, achieving 1.5 tokens/second inference. This demonstrates the increasing feasibility of running small LLMs on consumer-grade edge hardware.

89% relevant

Gemma 4 26B A4B Hits 45.7 tokens/sec Decode Speed on MacBook Air via MLX Community

A community benchmark shows the Gemma 4 26B A4B model running at 45.7 tokens/sec decode speed on a MacBook Air using the MLX framework. This highlights rapid progress in efficient local deployment of mid-size language models on consumer Apple Silicon.

93% relevant

Fireworks AI Launches 'Fire Pass' with Kimi K2.5 Turbo at 250 Tokens/Second

Fireworks AI has launched a new 'Fire Pass' subscription offering access to Kimi K2.5 Turbo at speeds up to 250 tokens/second. The service includes a free trial followed by a $7 weekly subscription.

85% relevant

Zhipu AI Announces GLM-5.1 Series, Featuring 1M Context and 128K Output Tokens

Zhipu AI has announced the GLM-5.1 model series, featuring a 1 million token context window and support for 128K output tokens. The update includes multiple model sizes and API availability.

85% relevant

MCP vs CLI: When to Skip MCP Servers and Save 37% on Tokens

Benchmarks show MCP servers can add 37% more input tokens vs. direct CLI commands. Learn when to use CLI for efficiency and when MCP's structure is worth the cost.

95% relevant

NVIDIA Spending ~$75K Per Engineer on AI Compute Tokens, Indicating Multi-Billion Dollar Annual Budget

NVIDIA is reportedly allocating approximately $75,000 in AI compute tokens per engineer annually, translating to a multi-billion dollar organization-wide budget for AI development resources.

87% relevant

Jensen Huang's AI Productivity Mandate: Engineers Must Spend 50% of Salary on AI Tokens

NVIDIA CEO Jensen Huang argues that a $500K engineer should spend at least $250K annually on AI inference tokens, framing token consumption as essential as CAD tools for chip design. He claims this investment eliminates perceptions of difficulty, time, and resource constraints in development.

85% relevant

PRISM Study: Mid-Training on 27B Tokens Boosts Math Scores by +15 to +40 Points, Enables Effective RL

A comprehensive study shows mid-training on 27B high-quality tokens consistently improves reasoning in LLMs. This 'retention-aware' phase restructures 90% of weights, creating a configuration where RL can succeed.

88% relevant

Sam Altman Aims for '5T Tokens Per Day' as OpenAI Reportedly Scales GPT-5.4

Sam Altman stated his goal is to flood the market with AI tokens, comparing intelligence to a utility. A separate, unverified report claims GPT-5.4 is processing '5T tokens per day' in its first week.

87% relevant

HyperTokens Break the Forgetting Cycle: A New Architecture for Continual Multimodal AI Learning

Researchers introduce HyperTokens, a transformer-based system that generates task-specific tokens on demand for continual video-language learning. This approach dramatically reduces catastrophic forgetting while maintaining fixed memory costs, enabling AI models to learn sequentially without losing previous knowledge.

75% relevant

Diffusion Architecture Breaks Speed Barrier: Inception's Mercury 2 Hits 1,000 Tokens/Second

Inception's Mercury 2 achieves unprecedented text generation speeds of 1,000 tokens per second using diffusion architecture borrowed from image AI. This represents a 10x speed advantage over leading models like Claude 4.5 Haiku and GPT-5 Mini without requiring custom hardware.

95% relevant

Anthropic Tightens Security: OAuth Tokens Banned from Third-Party Tools in Major Policy Shift

Anthropic has implemented a significant security policy change, prohibiting the use of OAuth tokens and its Agent SDK in third-party tools. This move comes amid growing enterprise adoption and heightened security concerns in the AI industry.

78% relevant

Claude Code's Secret Weapon: How the /btw Command Saves Tokens and Keeps You in Flow

Use the /btw command to ask quick, contextual questions without resetting your main task's conversation, saving tokens and preventing workflow interruptions.

95% relevant

Stripe Proposes Machine Payments Protocol: HTTP 402 & Scoped Tokens for AI Agent Payments

Stripe's open Machine Payments Protocol (MPP) enables AI agents to autonomously discover, negotiate, and complete payments using HTTP 402 status codes and scoped payment tokens. It supports both fiat and crypto rails, eliminating the need for human-in-the-loop payment flows.

95% relevant

DeepSeek v4 Pricing Cuts 75%: $0.43/M Tokens In

DeepSeek v4 API pricing permanently cut 75% to $0.43/M input, $0.87/M output, enabled by 27% compute and 10% cache vs v3.2.

95% relevant

How Andre Karpathy's CLAUDE.md Guidelines Save Millions of Tokens — and

Andre Karpathy's CLAUDE.md patterns cut token waste by 40%+. Copy his exact config to slash costs and speed up Claude Code.

86% relevant

MIT's RLM Handles 10M+ Tokens, Outperforms RAG on Long-Context Benchmarks

MIT researchers introduced Recursive Language Models (RLMs), which treat long documents as an external environment and use code to search, slice, and filter data, achieving 58.00 on a hard long-context benchmark versus 0.04 for standard models.

95% relevant

Install token-ninja: The MCP Server That Saves Tokens on Common Shell Commands

A new MCP server, token-ninja, automatically runs simple shell commands locally instead of sending them to Claude, cutting token usage and speeding up your workflow.

100% relevant

Codeburn: The TUI That Shows Exactly Where Your Claude Code Tokens Are Going

A new open-source TUI, Codeburn, analyzes Claude Code session transcripts to show token spend by task type, helping developers optimize their usage and costs.

100% relevant

Scaling Law Plateau Not Universal: More Tokens Boost Reasoning AI Performance

Empirical evidence indicates the 'second scaling law'—performance gains from increased computation—does not fully plateau for many reasoning tasks. Benchmark results may be artificially limited by token budgets, not model capability.

85% relevant

CLAUDE.md Promises 63% Reduction in Claude Output Tokens with Drop-in Prompt File

A new prompt engineering file called CLAUDE.md claims to reduce Claude's output token usage by 63% without code changes. The drop-in file aims to make Claude's code generation more efficient by structuring its responses.

87% relevant

Stop Claude Code's Web Fetches from Burning 700K Tokens on HTML Junk

A new MCP server, token-enhancer, strips scripts, nav bars, and ads from web pages before they hit Claude's context, cutting token waste by 90%+.

84% relevant