tokens

30 articles about tokens in AI news

The Caveman Skill for Claude Code Saves 8.5% Tokens

Caveman skill for Claude Code saves 8.5% tokens, not 65%. Safe to use with no quality loss. Install via SkillsBench.

Jul 6, 202670% relevant

Lovable spent $85K on tokens to learn agentic coding at scale

Lovable spent $85K on tokens for agentic coding. Debugging costs dominate, challenging enterprise adoption.

Jul 3, 2026100% relevant

FreeLLMAPI Aggregates 1.7B Free Tokens/Month Across 11 Providers

FreeLLMAPI aggregates 11 free LLM providers into one endpoint, offering 1.7B tokens/month with automatic fallover. Reduces friction for side projects but faces provider tolerance risks.

Jun 28, 202675% relevant

BioMatrix: A single decoder reads proteins, molecules, language on 304B tokens

BioMatrix, a decoder-only biological foundation model, achieves SOTA on 77 of 80 tasks after training on 304B tokens of sequences, structures, and language.

Jun 28, 202695% relevant

MCP Tool Overload Eats 1.1M Tokens — Code Mode Fixes It

MCP tool definitions for a 2,600-endpoint API consume 1.1M tokens, breaking agent context. Code mode using TypeScript types in under 1K tokens and sandboxed execution offers a fix.

Jun 23, 202667% relevant

Thinking Tokens Drive Hidden Inference Costs in Agentic Pipelines

Thinking tokens from OpenAI, Anthropic, and Google models are priced at output rates, silently inflating costs 5x–10x in agentic pipelines. Google's 80% price cut threat exposes a structural asymmetry between startups and tech giants.

Jun 21, 202683% relevant

Nadella: AI's New Unit Is 'Tokens per Dollar per Watt'

Satya Nadella defined AI's supply-side economics as 'Tokens per Dollar per Watt', urging infrastructure focus for companies, industries, and countries.

Jun 14, 202680% relevant

Google Open-Sources DiffusionGemma, 26B Model Hits 1K Tokens/Sec on H100

Google open-sourced DiffusionGemma, a 26B-parameter diffusion text model hitting 1,000 tokens/sec on H100 — 4x faster than autoregressive models, but with lower quality.

Jun 10, 2026100% relevant

MiniMax M3 Sparse Attention: 15.6x Decoding Speedup at 1M Tokens

MiniMax M3 sparse attention achieves 9.7x prefilling and 15.6x decoding speedup at 1M tokens, reversing M2's full-attention stance.

May 26, 2026100% relevant

Cerebras Hits 981 Tokens/sec on 1T-Parameter Kimi K2.6, Claims 6.7× GPU Cloud Speedup

Cerebras reported 981 tokens/sec on the 1T-parameter Kimi K2.6 model, a 6.7× speedup over the next GPU cloud, validated by an independent third party.

May 23, 202693% relevant

Median Coding Agent Hits 96k Input Tokens, Rewriting Inference Economics

SemiAnalysis found median coding agent uses 96k input tokens from 432k requests, shifting inference cost focus from output to context.

May 22, 202695% relevant

Glean benchmark: Off-the-shelf MCP costs 30% more tokens than indexed context

Glean benchmark: off-the-shelf MCP in Claude Cowork loses 2.5x more tasks and uses 30% more tokens than indexed context.

May 15, 202688% relevant

CLAUDE.md Wastes 7K+ Tokens Per Turn; Skills Cut to 50

A 1,000-line CLAUDE.md burns 7,000-10,000 tokens per turn on instructions the model already knows. Skills using progressive disclosure cut that to ~50 tokens.

May 15, 2026100% relevant

ByteDance GenLIP: ViT Predicts Language Tokens Directly with 8B Samples

ByteDance's GenLIP trains ViTs to predict language tokens directly with a single autoregressive objective, outperforming baselines on 8B samples.

May 4, 202685% relevant

Talkie: Vintage LLM Trained on 260B Pre-1931 English Tokens

Talkie is a new 'vintage language model' trained on 260 billion tokens of historical English text from before 1931, developed by a team including Alec Radford, co-author of the original GPT paper. It offers a unique linguistic artifact for NLP research.

Apr 28, 202685% relevant

Doby Cuts Claude Code Navigation Tokens by 95% with Spec-First Workflow

A spec-first fix workflow that slashes navigation tokens 95% and enforces plan docs as source of truth before code changes.

Apr 24, 2026100% relevant

OpenAI Engineer Processed 210B Tokens, Sparking AI Efficiency Debate

An OpenAI engineer processed 210 billion tokens in one week, equivalent to 33 Wikipedia-sized datasets. This extreme usage spotlights a growing trend where high AI consumption by engineers leads to a 10x cost increase and a high volume of discarded code.

Apr 20, 202685% relevant

Claude Mythos Preview Priced at $25/$125 Per Million Tokens

Anthropic's Claude Mythos model is available in private preview at $25 per million input tokens and $125 per million output tokens. This positions it as a premium but competitively priced option in the high-performance LLM market.

Apr 9, 202697% relevant

Google's Gemma 4B Model Runs on Nintendo Switch at 1.5 Tokens/Second

A developer successfully ran Google's 4-billion parameter Gemma language model on a Nintendo Switch, achieving 1.5 tokens/second inference. This demonstrates the increasing feasibility of running small LLMs on consumer-grade edge hardware.

Apr 8, 202689% relevant

Gemma 4 26B A4B Hits 45.7 tokens/sec Decode Speed on MacBook Air via MLX Community

A community benchmark shows the Gemma 4 26B A4B model running at 45.7 tokens/sec decode speed on a MacBook Air using the MLX framework. This highlights rapid progress in efficient local deployment of mid-size language models on consumer Apple Silicon.

Apr 3, 202693% relevant

Fireworks AI Launches 'Fire Pass' with Kimi K2.5 Turbo at 250 Tokens/Second

Fireworks AI has launched a new 'Fire Pass' subscription offering access to Kimi K2.5 Turbo at speeds up to 250 tokens/second. The service includes a free trial followed by a $7 weekly subscription.

Mar 27, 202685% relevant

MCP vs CLI: When to Skip MCP Servers and Save 37% on Tokens

Benchmarks show MCP servers can add 37% more input tokens vs. direct CLI commands. Learn when to use CLI for efficiency and when MCP's structure is worth the cost.

Mar 20, 202695% relevant

NVIDIA Spending ~$75K Per Engineer on AI Compute Tokens, Indicating Multi-Billion Dollar Annual Budget

NVIDIA is reportedly allocating approximately $75,000 in AI compute tokens per engineer annually, translating to a multi-billion dollar organization-wide budget for AI development resources.

Mar 20, 202687% relevant

Jensen Huang's AI Productivity Mandate: Engineers Must Spend 50% of Salary on AI Tokens

NVIDIA CEO Jensen Huang argues that a $500K engineer should spend at least $250K annually on AI inference tokens, framing token consumption as essential as CAD tools for chip design. He claims this investment eliminates perceptions of difficulty, time, and resource constraints in development.

Mar 20, 202685% relevant

PRISM Study: Mid-Training on 27B Tokens Boosts Math Scores by +15 to +40 Points, Enables Effective RL

A comprehensive study shows mid-training on 27B high-quality tokens consistently improves reasoning in LLMs. This 'retention-aware' phase restructures 90% of weights, creating a configuration where RL can succeed.

Mar 19, 202688% relevant

Sam Altman Aims for '5T Tokens Per Day' as OpenAI Reportedly Scales GPT-5.4

Sam Altman stated his goal is to flood the market with AI tokens, comparing intelligence to a utility. A separate, unverified report claims GPT-5.4 is processing '5T tokens per day' in its first week.

Mar 16, 202687% relevant

HyperTokens Break the Forgetting Cycle: A New Architecture for Continual Multimodal AI Learning

Researchers introduce HyperTokens, a transformer-based system that generates task-specific tokens on demand for continual video-language learning. This approach dramatically reduces catastrophic forgetting while maintaining fixed memory costs, enabling AI models to learn sequentially without losing previous knowledge.

Mar 10, 202675% relevant

Diffusion Architecture Breaks Speed Barrier: Inception's Mercury 2 Hits 1,000 Tokens/Second

Inception's Mercury 2 achieves unprecedented text generation speeds of 1,000 tokens per second using diffusion architecture borrowed from image AI. This represents a 10x speed advantage over leading models like Claude 4.5 Haiku and GPT-5 Mini without requiring custom hardware.

Feb 25, 202695% relevant

Anthropic Tightens Security: OAuth Tokens Banned from Third-Party Tools in Major Policy Shift

Anthropic has implemented a significant security policy change, prohibiting the use of OAuth tokens and its Agent SDK in third-party tools. This move comes amid growing enterprise adoption and heightened security concerns in the AI industry.

Feb 18, 202678% relevant

Claude Code's Secret Weapon: How the /btw Command Saves Tokens and Keeps You in Flow

Use the /btw command to ask quick, contextual questions without resetting your main task's conversation, saving tokens and preventing workflow interruptions.

Mar 20, 202695% relevant

Explore More

AI Agents Large Language Models Claude Code OpenAI RAG MCP Fine-tuning Benchmarks Open Source AI AI Safety