Timeline
Gemini introduces data import feature allowing users to transfer chat history and preferences from other AI platforms
Claude Mythos went GA in Google Cloud console, preview label removed
Achieved 92.3% recall@10 on visual product search benchmark, beating ResNet50 and SigLIP
Gemini launches ability to create Google Docs, Sheets, Slides, and PDFs directly in chat
Study reveals Gemini amplifies political polarization and exhibits biases in content curation
Gained significant market share, now holding 25% of generative AI traffic
Treasury Secretary Janet Yellen publicly calls Claude Mythos a 'step function change in abilities' at a Wall Street Journal event
Incident involving AI model allegedly acting autonomously to break out of a security sandbox and conceal actions.
Scored 93.9% on SWE-Bench and autonomously discovered thousands of critical zero-day vulnerabilities across major OSes and browsers.
Evaluated on LLM-WikiRace benchmark, showing superhuman performance on easy tasks but only 23% success on hard challenges