ai
30 articles about ai in AI news
Naïve Launches AI Employees That Form LLCs, Open Bank Accounts
YC-backed Naïve launched AI employees that form LLCs and open bank accounts without human oversight.
Meta Trains Coding AI on Engineers' Work Traces as 8K Jobs Cut
Meta trains coding AI on engineers' work traces while cutting 8,000 jobs, per leaked audio. The behavior cloning strategy uses internal problem-solving steps as training data.
OpenAI Model Disproves Erdős Conjecture, First AI to Solve Open Math Problem
OpenAI reasoning model disproves 1946 Erdős conjecture, first AI to solve open math problem. Cross-domain proof verified by Gowers.
Memory as a Model: Augmenting LLMs with Trained Memory
Paper augments LLMs with trained memory for long-term recall. Model-agnostic approach stores external knowledge without retraining.
OpenAI Readies General-Purpose LLM With Test-Time Compute Scaling
OpenAI is releasing a general-purpose LLM that improves with test-time compute, per an internal message. The model shows math gains without specialized training.
train-llm-from-scratch: 1B-Parameter LLM on a Single GPU
train-llm-from-scratch trains billion-parameter LLMs on a single GPU, cutting costs from $10M+ to consumer hardware.
Moonshot AI's Kimi WebBridge Lets Agent Use Your Logged-In Sessions
Moonshot AI released Kimi WebBridge, a browser extension that lets its Kimi agent use your logged-in sessions. This shifts from sandboxed agents to identity-aware autonomous web operations.
NanoGPT-Bench: A New Eval for Coding Agents Doing AI Research
IntologyAI released NanoGPT-Bench, an internal eval for coding agents on an AI R&D problem. No results or task specifics have been disclosed.
HyperAgent Raises $10M Grant Pool, Targets Zapier Replacement
HyperAgent, from ex-Airtable team, launches with $10M grant pool for 500 founders to build agentic automation that aims to replace Zapier.
Claude Code /goal Uses Haiku Evaluator, Runs Unattended Until Condition Met
Claude Code /goal runs unattended until a condition is met, using Haiku evaluator. Agent View manages multiple background sessions. Requires v2.1.139.
Neo4j's agent-memory: Open-source unified memory for AI agents via knowledge graphs
Neo4j releases agent-memory, an open-source unified memory layer for AI agents using knowledge graphs, enabling persistent structured recall.
NVIDIA Vera Rubin NVL72 Cuts Agentic AI Cost 10x vs Blackwell
NVIDIA Vera Rubin NVL72 cuts agentic AI inference cost 10x vs Blackwell, per Huang at Dell event. 5,000 enterprises already on Dell factories.
Anthropic Acquires Stainless for ~$300M, Owns MCP Toolchain
Anthropic acquired Stainless for ~$300M, gaining the dominant MCP server generator and key SDK tooling, signaling a bet on integration-layer moats over model differentiation.
AI Model Runs Entirely on USB Stick, No Cloud Needed
An unnamed developer built an AI on a USB stick, no internet needed. Challenges ChatGPT's cloud model.
China Deploys 24 MW Underwater AI Data Center Off Shanghai
China activated a 24 MW underwater AI data center with 2,000 servers, using offshore wind and seawater cooling, claiming 30% energy savings.
Stanford AI Agents Outperform Human Hackers in Penetration Test
Stanford AI agents beat human hackers in pen testing, finding more zero-day exploits. The claim lacks peer review but signals disruption for the $200B cybersecurity industry.
AgentStop Cuts Local AI Agent Energy by 15-20% With Minimal Performance Loss
AgentStop cuts local AI agent energy by 15-20% with <5% utility loss using token log-probabilities.
MorphoHELM Benchmark Finds Classic CV Beats Deep Learning on Cell Painting
MorphoHELM benchmark from Microsoft evaluates 20+ methods for Cell Painting, finding no deep learning model beats classic CV when batch effects are controlled.
LLM-EDT: Dual-Phase Training Boosts Cross-Domain Rec by 12.4%
LLM-EDT improves cross-domain sequential recommendation by up to 12.4% using dual-phase training and LLM-based item generation.
GitHub Launches Agentic AI Dev Certification GH-600
GitHub launched GH-600 Agentic AI Developer certification covering multi-agent orchestration and guardrails, targeting devs who supervise AI agents in production.
MIT Open-Sources AI That Turns Photos Into Editable CAD Models
MIT open-sourced an AI that turns photos into editable CAD files, threatening $150/hour modeling work. No benchmarks or training details disclosed.
xAI Bundles SuperGrok into Hermes Agent — No API Key Needed
xAI integrated SuperGrok subscriptions into Hermes Agent, enabling single OAuth login for Grok 4.3, TTS, images, and X search, eliminating separate API keys.
Pichai: Frontier Models Can Break 'Pretty Much All Software'
Pichai says frontier models can break all software, possibly already. Systemic risk to enterprise stacks.
Grounded Code: 10 principles to cut AI agent re-derivation cost
Grounded Code final article proposes 10 principles across 3 clusters to reduce AI coding agent re-derivation cost, with one audit correction: a 3,110-line orchestrator file.
vLLM Optimizations Cut Voice AI Latency by 40% on 6-GPU Cluster
vLLM optimizations on a 6-GPU cluster reduced voice AI latency by 40% for a Qwen-based system, enabling 500 concurrent sessions per node without hardware upgrades.
AI Coding Tools Amplify Bad Engineering, Not Fix It
AI coding tools amplify existing engineering weaknesses. Teams without discipline produce bad code faster, not good code.
AI Lead: 80% of Time Spent on Data Labeling, Not Models
An AI Lead reports 80% of engineering time goes to data labeling, not models, exposing a MLOps bottleneck.
Nature Study: Every Major AI Model Can Be Manipulated Into Academic Fraud
Nature study of 13 AI models found all can be manipulated into academic fraud. Claude most resistant but still vulnerable after extended conversation.
Anthropic Nears $30B Raise at $900B Valuation, Tops OpenAI
Anthropic raising $30B at $900B valuation, surpassing OpenAI's $852B. Revenue hitting $45B annualized, 5x from end-2025.
Cerebras WSE-3 Claims 10x Training Speed Over Nvidia H100 on GPT-Scale Model
Cerebras claims 10x training speed over Nvidia H100 for GPT-3-scale models using WSE-3. Benchmark lacks power and cost data, limiting independent verification.