lean
30 articles about lean in AI news
Google LEAP Scaffold Lifts Lean-IMO-Bench One-Shot Solve Rate from <10% to 70%
Google's LEAP scaffold lifts Lean-IMO-Bench one-shot solve rate from <10% to 70%, solving all 12 Putnam 2025 problems.
Glean benchmark: Off-the-shelf MCP costs 30% more tokens than indexed context
Glean benchmark: off-the-shelf MCP in Claude Cowork loses 2.5x more tasks and uses 30% more tokens than indexed context.
Onyx: Open-Source AI Enterprise Search Challenges Glean's $7.2B Valuation
Open-source platform Onyx provides self-hosted AI enterprise search connecting to 40+ tools, offering a free alternative to Glean's $50/user/month SaaS. Backed by YC and $10M seed funding, it's used by Netflix and Ramp.
Microsoft's 'Markdownify' Converts PDFs, Audio, Video to Clean LLM Markdown
Microsoft launched 'Markdownify', a Python tool that converts PDFs, Word docs, Excel, PowerPoint, audio, and YouTube URLs into clean Markdown. This addresses a major pain point in AI pipelines where raw file parsing breaks context and structure.
Clean Up Messy Claude Code Terminal Pastes in One Click
Use the 'Cleanup Claude Code Paste' web tool to instantly clean copied terminal output, removing the prompt character and fixing line-wrapping issues for clean prompts.
Microsoft's Satya Nadella Details Internal 'Lean for Knowledge Work' AI Initiative
Microsoft CEO Satya Nadella described the company's internal application of AI to streamline knowledge work, framing it as a 'Lean' manufacturing-style efficiency push for cognitive tasks. The initiative focuses on using AI to reduce process friction and improve productivity across internal operations.
Chinese Startup Pairs Human Cleaners with Autonomous AI Robots for Household Chores
A new home service in China deploys autonomous AI robots alongside human cleaners to perform household chores. This represents an early commercial implementation of mobile manipulation AI in domestic settings.
XSquareRobot and 58.com Launch China's First Human-Robot Home Cleaning Service in Shenzhen
A new service in Shenzhen pairs human cleaners with autonomous AI robots running on the WALL-A system. The robot handles repetitive tasks while the human manages complex judgment, with real home deployment providing training data.
Terence Tao Demonstrates AI's Growing Role in Formal Mathematics with Claude and Lean
Fields Medalist Terence Tao has released a video showing how Claude Code can be used to formalize mathematical proofs in Lean, highlighting AI's expanding capabilities in high-level mathematics.
Big Tech Earnings: Google Has the Cleanest AI Story, Says Analyst
A market analyst argues Alphabet has the strongest fundamental AI story among Big Tech earnings today, driven by Cloud and TPUv8 demand. Microsoft has the easiest beat-and-run setup due to a beaten-down stock, while Meta and Amazon face higher expectations after recent gains.
What Cursor's 8GB Storage Bloat Teaches Us About Claude Code's Clean Architecture
A deep dive into Cursor's scattered 8GB local storage reveals why Claude Code's ~/.claude/projects/*.jsonl approach is better for developers.
Learning to Disprove: LLMs Fine-Tuned for Formal Counterexample Generation in Lean 4
Researchers propose a method to train LLMs for formal counterexample generation, a neglected skill in mathematical AI. Their symbolic mutation strategy and multi-reward framework improve performance on three new benchmarks.
DeMellier grows by leaning into craftsmanship and alternative materials as
DeMellier founder Mireia Llusia-Lindh explains how focusing on craftsmanship, alternative materials, and controlled growth is driving demand, with Lyst searches up 97% YoY. The strategy echoes broader shifts at Kering and Bottega Veneta as the luxury sector loses 70 million customers due to value concerns.
Polarization by Default: New Study Audits Recommendation Bias in LLM-Based
A controlled study of 540,000 LLM-based content selections reveals robust biases across providers. All models amplified polarization, showed negative sentiment preferences, and exhibited distinct trade-offs in toxicity handling and demographic representation, with political leaning bias being particularly persistent.
JBM-Diff: A New Graph Diffusion Model for Denoising Multimodal Recommendations
A new arXiv paper introduces JBM-Diff, a conditional graph diffusion model designed to clean 'noise' from multimodal item features (like images/text) and user behavior data (like accidental clicks) in recommendation systems. It aims to improve ranking accuracy by ensuring only preference-relevant signals are used.
Google Launches AI Edge Eloquent: Free, Offline-First Dictation App on iOS
Google has quietly launched AI Edge Eloquent, a free, subscription-less dictation app for iOS. It uses a Gemma-based speech recognition model to process audio locally, removing filler words and self-corrections to produce cleaner text.
A Practical Guide to Fine-Tuning Open-Source LLMs for AI Agents
This Portuguese-language Medium article is Part 2 of a series on LLM engineering for AI agents. It provides a hands-on guide to fine-tuning an open-source model, building on a foundation of clean data and established baselines from Part 1.
Claude Code Hooks: How to Auto-Format, Lint, and Test on Every Save
Configure hooks in .claude/settings.json to run prettier, eslint, and tests automatically, ensuring clean code without manual intervention.
Axios NPM Package Under Active Supply Chain Attack, Potentially Impacts 100M+ Weekly Installs
The widely-used JavaScript HTTP client library Axios may be compromised via a malicious dependency in its latest release, exhibiting malware-like behavior including shell execution and artifact cleanup. With over 100 million weekly downloads, this represents a critical software supply chain threat.
How to Auto-Approve Safe WebFetches While Blocking Suspicious URLs with Hooks
Use Claude Code's PreToolUse hooks to automatically allow clean documentation URLs while forcing manual review for any URL containing query parameters, eliminating repetitive prompts without sacrificing security.
The Leaked 'Employee-Grade' CLAUDE.md: How to Use It Today
A leaked CLAUDE.md used by Anthropic employees reveals advanced directives for verification, context management, and anti-laziness. Here's the cleaned-up version you can use.
New Research Proposes FilterRAG and ML-FilterRAG to Defend Against Knowledge Poisoning Attacks in RAG Systems
Researchers propose two novel defense methods, FilterRAG and ML-FilterRAG, to mitigate 'PoisonedRAG' attacks where adversaries inject malicious texts into a knowledge source to manipulate an LLM's output. The defenses identify and filter adversarial content, maintaining performance close to clean RAG systems.
Arxitect: The Claude Code Plugin That Enforces SOLID Principles Automatically
Install Arxitect to make Claude Code's implementations adhere to API design, OO principles, and Clean Architecture—preventing technical debt accumulation.
Claude Code's Deny List Bypass: How to Protect Your Codebase from Compound Commands
Claude Code's deny lists only check the first token of compound commands, allowing dangerous actions like 'git clean' to slip through. Here's how to protect yourself.
OpenAI in Advanced Talks to Buy Electricity from Sam Altman-Backed Helion Energy
OpenAI is negotiating to purchase electricity from fusion startup Helion Energy, with a potential deal securing 12.5% of Helion's initial power output. This move signals a strategic push by the AI giant to lock in massive, clean energy for future compute needs.
Agent HTTP: Add a Production-Ready HTTP API to Claude Code in 5 Minutes
Agent HTTP is an MCP server that gives Claude Code a clean HTTP API, enabling programmatic control and integration without terminal scraping.
The LLM Evaluation Problem Nobody Talks About
An article highlights a critical, often overlooked flaw in LLM evaluation: the contamination of benchmark data in training sets. It discusses NVIDIA's open-source solution, Nemotron 3 Super, designed to generate clean, synthetic evaluation data.
Base44 Superagent Emerges as Streamlined AI Assistant Platform
Base44 has launched Superagent, an AI assistant platform that reportedly enables users to connect to multiple productivity tools—including Gmail, Calendar, Slack, WhatsApp, and Telegram—in under three minutes. Early users praise its clean setup experience and direct integration capabilities.
China's Nuclear Revolution: How Particle Accelerators Could Power Civilization for a Millennium
Chinese scientists are developing an accelerator-driven subcritical reactor that burns nuclear waste as fuel, potentially providing clean energy for 1,000 years while solving radioactive waste problems. The megawatt-scale prototype aims for 2027 operation.
Microsoft's MarkItDown Library Revolutionizes Document Processing for AI Applications
Microsoft's AutoGen team has released MarkItDown, an open-source Python library that converts diverse document formats into clean Markdown for LLM consumption. This tool eliminates complex preprocessing pipelines and supports over 10 file types including PDFs, Office documents, images, and audio.