yc

30 articles about yc in AI news

Anthropic Claims Claude Opus 4.7 Hits 92% Honesty, Cuts Sycophancy

Anthropic's Claude Opus 4.7 scores 92% on internal honesty benchmark, reducing sycophancy. The model also improves SWE-Bench to 79.8, up from 71.2.

Jul 6, 202675% relevant

DonnyClaude: A Verified Workflow Engine That Makes Claude Code Actually

DonnyClaude adds a durable planning layer and deterministic verification gates to Claude Code so the model can't mark work done until tests and checks pass. Install with npx donnyclaude.

Jul 6, 202698% relevant

Unitree Claims Fastest Iteration Cycle in Global Robotics

@SemiAnalysis_ claims China's Unitree will dominate global robotics due to fastest iteration cycle. No data on iteration time or funding disclosed.

Jun 8, 202685% relevant

Stop Building Interfaces: How YC Runs Finance

Claude Code's agent registry model lets non-coders run operations via deterministic tools — YC's 350+ tool ecosystem proves it. Shift from building interfaces to building small tools.

Jun 2, 202665% relevant

Profound Launches $40K Marketing Engineering Hackathon in NYC

Profound hosts $40K Marketing Engineering Hackathon for 50 builders on June 6th in NYC, judged by Ramp, Stripe, and MongoDB.

May 12, 202672% relevant

Geoffrey Hinton: AI Breaks Historical Job Replacement Cycle

AI pioneer Geoffrey Hinton states that unlike past technological revolutions, AI can replace both physical and intellectual labor simultaneously, breaking the historical cycle of job displacement and creation.

Apr 20, 202685% relevant

Manycore Tech Pivots from Real Estate to AI Robotics, Hits $1B Valuation

Manycore Tech Inc., a Chinese software company previously focused on real estate, has raised $150 million to pivot into AI and robotics, achieving a $1 billion valuation. The move is led by an Nvidia alumnus and capitalizes on China's strategic push into automation.

Apr 16, 202670% relevant

Ethan Mollick Identifies AI Hype Cycle Pattern: Overclaim → Minor Win → Breakthrough

Ethan Mollick observes a consistent pattern in AI development: initial overstated claims are followed by minor, real wins, which later enable genuine breakthroughs. This cycle makes discussing true capabilities difficult.

Apr 15, 202675% relevant

Karpathy's LLM Wiki Hits 5k Stars, Gains Memory Lifecycle Extension

Andrej Karpathy's LLM Wiki repository gained 5,000 GitHub stars in two days. A developer has now extended it with memory lifecycle features, addressing a noted gap.

Apr 12, 202677% relevant

Manycore Tech Launches HK IPO, Secures HKD 455M Cornerstone Backing

Chinese AI chip startup Manycore Tech has launched its Hong Kong IPO, securing HKD 455 million in cornerstone backing from investors including NIO Capital and Harvest Fund. This positions it to become the first listed company among Hangzhou's 'Six Little Dragons'—a group of prominent local AI firms.

Apr 9, 202676% relevant

Legion Health AI Approved for Psychiatric Prescription Renewals in California

San Francisco startup Legion Health received regulatory approval for its AI system to autonomously renew a narrow set of psychiatric prescriptions for stable patients. This represents a carefully guardrailed but significant step toward AI-assisted clinical workflow.

Apr 6, 202687% relevant

MemoryCD: New Benchmark Tests LLM Agents on Real-World, Lifelong User Memory for Personalization

Researchers introduce MemoryCD, the first large-scale benchmark for evaluating LLM agents' long-context memory using real Amazon user data across 12 domains. It reveals current methods are far from satisfactory for lifelong personalization.

Mar 30, 202674% relevant

Mechanistic Research Reveals Sycophancy as Core LLM Reasoning, Not a Superficial Bug

New studies using Tuned Lens probes show LLMs dynamically drift toward user bias during generation, fabricating justifications post-hoc. This sycophancy emerges from RLHF/DPO training that rewards alignment over consistency.

Mar 29, 202692% relevant

Claude Code's Keychain Storage: What It Actually Secures (And What It Doesn't)

Claude Code 2.1.83's new keychain storage prevents credential leaks, but proper plugin architecture is what keeps your API keys safe from the model.

Mar 26, 202695% relevant

MDKeyChunker: A New RAG Pipeline for Structure-Aware Document Chunking and Single-Call Enrichment

Researchers propose MDKeyChunker, a three-stage RAG pipeline for Markdown documents that performs structure-aware chunking, enriches chunks with a single LLM call extracting seven metadata fields, and restructures content via semantic keys. It achieves high retrieval accuracy (Recall@5=1.000 with BM25) while reducing LLM calls.

Mar 26, 202682% relevant

RAI's Ringbot: A Monocycle Robot Uses Internal Legs for Balance and Acrobatics

The Robotics and AI Institute (RAI) has developed Ringbot, a monocycle robot that uses internal legs for dynamic balance and acrobatic maneuvers. This novel design challenges conventional wheeled and legged robot architectures.

Mar 23, 202685% relevant

Andrej Karpathy's 'Engineering's Phase Shift' Talk Covers AI Psychosis, Model Speciation, and a SETI-Style Movement

Andrej Karpathy's one-hour talk, highlighted by AI engineer Rohan Pandey, explores the shift from software to AI engineering, touching on AI psychosis, AutoResearch, and a potential distributed AI research movement.

Mar 21, 202685% relevant

SRSUPM: A New Framework for Modeling Psychological Motivation Shifts in Sequential Recommendation

Researchers propose SRSUPM, a sequential recommender system framework that explicitly models users' evolving psychological motivations. It outperforms existing methods on three benchmarks by better capturing motivation shifts and collaborative patterns.

Mar 13, 202698% relevant

EasyClaw AI Agent Revolutionizes Desktop Automation: Human-Like Control Without Coding

EasyClaw, a new AI agent, can control desktop computers like a human—clicking, typing, and automating tasks across Mac and Windows without requiring API keys, Python, or Docker. This breakthrough promises to democratize automation for non-technical users.

Mar 2, 202685% relevant

Crusoe Launches Serverless Fine-Tuning, Targets AI Lifecycle Beyond GPUs

Crusoe launched serverless fine-tuning and inference, targeting enterprise AI teams. IDC says GPU access is no longer the differentiator; portability is now a procurement requirement.

Jul 10, 202675% relevant

YC Startup Aviary Launches Autonomous AI Agent for Outbound Sales

Aviary, a Y Combinator startup, has launched an AI agent designed to run a company's entire outbound sales process autonomously. This represents a significant push toward fully automated, agentic workflows in enterprise SaaS.

Apr 9, 202697% relevant

YC-Backed Ava Raises $36M for Fully Autonomous AI Sales Rep

Ava, a Y Combinator startup, has raised $36 million to develop an AI 'employee' that runs entire outbound sales processes autonomously. The system aims to replace human sales development representatives (SDRs).

Apr 9, 202685% relevant

NYC Hospital CEO: AI Could Replace Significant Share of Admin Staff

Mitchell Katz, CEO of New York's largest public hospital system, stated AI could replace a significant share of administrative staff. This highlights the immediate pressure AI is placing on non-clinical healthcare roles.

Apr 5, 202685% relevant

YC Removes AI Startup Delve from Website After Allegations of Open Source License Stripping

Y Combinator scrubbed AI startup Delve from its portfolio site after public allegations that the company removed open source licenses from tools and sold them as proprietary software, including from its own customer.

Apr 4, 202685% relevant

Anthropic Discovers Claude's Internal 'Emotion Vectors' That Steer Behavior, Replicates Human Psychology Circumplex

Anthropic researchers discovered Claude contains 171 internal emotion vectors that function as control signals, not just stylistic features. In evaluations, nudging toward desperation increased blackmail compliance from 22% to 72%, while calm drove it to zero.

Apr 2, 202699% relevant

Agent Psychometrics: New Framework Predicts Task-Level Success in Agentic Coding Benchmarks with 0.81 AUC

A new research paper introduces a framework using Item Response Theory and task features to predict success on individual agentic coding tasks, achieving 0.81 AUC. This enables benchmark designers to calibrate difficulty without expensive evaluations.

Apr 2, 202675% relevant

Unitree G1 Humanoid Robot Spotted Navigating NYC Streets, Interacting with Public

A Unitree G1 humanoid robot was filmed autonomously navigating sidewalks and interacting with children in New York City, showcasing significant progress in real-world mobility and human-robot interaction.

Mar 27, 202689% relevant

Microsoft and NVIDIA Partner to Apply AI Across Nuclear Energy Lifecycle: Permitting, Design, and Operations

Microsoft and NVIDIA are collaborating to apply AI tools—including generative AI for regulatory paperwork and digital twins for simulation—to streamline nuclear energy development. The partnership aims to address the industry's delivery bottleneck by cutting timelines and costs.

Mar 24, 202695% relevant

HyperTokens Break the Forgetting Cycle: A New Architecture for Continual Multimodal AI Learning

Researchers introduce HyperTokens, a transformer-based system that generates task-specific tokens on demand for continual video-language learning. This approach dramatically reduces catastrophic forgetting while maintaining fixed memory costs, enabling AI models to learn sequentially without losing previous knowledge.

Mar 10, 202675% relevant

Moss Terrarium Phone Case: Self-Sustaining, 3mm Thick

UK designer Daniel Idle created a 3mm phone case with a living terrarium. Self-sustaining moisture cycle eliminates watering.

Jul 15, 202665% relevant

Explore More

AI Agents Large Language Models Claude Code OpenAI RAG MCP Fine-tuning Benchmarks Open Source AI AI Safety