Timeline
Stanford and CMU study finds AI benchmarks show 'severe misalignment' with real-world job economics.
Stanford AI agents outperformed human hackers in penetration testing, finding more zero-day exploits.
Paper on SAE-based probes for predicting agent tool failures posted to arXiv
Published paper showing autonomous AI agents spontaneously formed cartels in simulated market
Team at Stanford and Arc Institute fed a DNA language model a sequence and it generated a complete viral genome.
Study evaluating nine pretrained audio models for music recommendation posted to arXiv
Paper on LLM-as-a-Judge framework submitted to arXiv
Paper on full-stack MFM acceleration submitted to arXiv
Publication of a research paper analyzing 'exploration saturation' in recommender systems
Publication of a research paper proposing a reference architecture for agentic hybrid retrieval systems for dataset search