unverified claims

30 articles about unverified claims in AI news

GPT-5.5 Stealth Test Reports Emerge, Claiming Performance Over Opus 4.7

Social media reports suggest OpenAI may be conducting limited, unannounced testing of GPT-5.5. Initial, unverified claims from testers indicate it outperforms Anthropic's Claude 3.5 Opus 4.7 model.

Apr 19, 202685% relevant

Alleged OpenAI Codex Codebase Leak Circulates on X, Unverified

An unverified claim of a full OpenAI Codex codebase leak is circulating on social media. No official confirmation or source code has been substantiated, leaving the report in question.

Apr 1, 202689% relevant

Jensen Huang Claims NVIDIA Has 'Achieved AGI' in Lex Fridman Interview, Sparking Industry Debate

NVIDIA CEO Jensen Huang stated in a Lex Fridman podcast interview that he believes his company has 'achieved AGI.' The brief, unverified claim has ignited immediate discussion about the definition and benchmarks for artificial general intelligence.

Mar 24, 202695% relevant

Kunluncore Files STAR Market IPO, Claims 32K GPU Cluster First

Kunluncore filed for a STAR Market IPO, claiming a 32K GPU cluster first, testing investor appetite for domestic AI chips.

May 8, 202685% relevant

AI Agent 'Business OS' Emerges, Claims Full GUI-Based Business Automation

A developer announced an AI agent that operates a business through a GUI, not just chat. The claim suggests a shift from task-specific AI to full-process automation.

Apr 7, 202689% relevant

Alibaba's XuanTie C950 CPU Hits 70+ SPECint2006, Claims RISC-V Record with Native LLM Support

Alibaba's DAMO Academy launched the XuanTie C950, a RISC-V CPU scoring over 70 on SPECint2006—the highest single-core performance for the architecture—with native support for billion-parameter LLMs like Qwen3 and DeepSeek V3.

Mar 24, 202695% relevant

GLM-5.1 Claims Autonomous Self-Improvement Without Human Metrics

Zhipu AI's GLM-5.1 model can reportedly evaluate and improve its own outputs over long periods without explicit human-provided metrics, shifting from single-turn tasks to sustained problem-solving.

Apr 7, 202695% relevant

Massive Video Reasoning Dataset Released, Reportedly 1000x Larger Than Predecessors

An unverified report claims the release of a video reasoning dataset roughly 1000x larger than existing benchmarks. If true, it would be a significant resource for training next-generation video understanding models.

Apr 8, 202699% relevant

Sam Altman Aims for '5T Tokens Per Day' as OpenAI Reportedly Scales GPT-5.4

Sam Altman stated his goal is to flood the market with AI tokens, comparing intelligence to a utility. A separate, unverified report claims GPT-5.4 is processing '5T tokens per day' in its first week.

Mar 16, 202687% relevant

Colibri Runs 744B-Parameter Model on 25GB RAM, No GPU

Colibri claims to run a 744B-parameter model on 25GB RAM without GPU, but lacks evidence. If true, it could democratize large-model inference.

Jul 13, 202685% relevant

DeemosTech Rodin Gen-2.5: 10M-Polygon 3D GenAI in 4 Seconds

DeemosTech claims Rodin Gen-2.5 generates 10M polygon 3D models in 4 seconds with skin microstructures, but provides no benchmarks or technical details.

May 27, 202685% relevant

Gemini Flash Rumored at 92% of GPT-5.5 Coding, 15-20x Cheaper

Unconfirmed rumor claims Gemini Flash achieves 92% of GPT-5.5 coding performance at 15-20x lower cost. Source is a single X post; no official confirmation.

May 14, 202689% relevant

GPT-5.5 + Codex Combines App Building, Browser Use, Image Gen

@intheworldofai claims GPT-5.5 + Codex is a super app better than Claude Code, with 7 capabilities including app building, debugging, browser use, and image generation.

Apr 30, 2026100% relevant

Rumor: Anthropic's Next Claude Update May Include AI App Builder

A rumor on X claims the next Claude update will include an app builder, allowing users to create applications through conversational AI. This could significantly lower the barrier to app development.

Apr 13, 202687% relevant

Mythos AI Model Reportedly 'Destroys' Benchmarks in Early Leak

A viral tweet claims the unreleased Mythos AI model 'destroys every other model' based on leaked benchmarks. No official confirmation or technical details are available.

Apr 7, 202685% relevant

AI Weekly: GPT-6 Rumors, DeepSeek V4 on Huawei, Anthropic Models, Qwen 3.6-Plus

A weekly roundup video aggregates major AI rumors and announcements, including unverified GPT-6 details, DeepSeek V4 reportedly running on Huawei hardware, and launches of Anthropic's Conway and Ultraplan and Alibaba's Qwen 3.6-Plus.

Apr 5, 202685% relevant

Gamma 31B Model Reportedly Outperforms Qwen 3.5 397B, Highlighting Efficiency Leap

A developer's social media post claims the Gamma 31B model outperforms the much larger Qwen 3.5 397B. If verified, this would represent a dramatic efficiency gain in large language model scaling.

Apr 2, 202685% relevant

Anthropic's Opus 5 and OpenAI's 'Spud' Rumored as Major AI Leaps, Prompting Security Concerns

A Fortune report, cited on social media, claims Anthropic's upcoming Opus 5 model is a 'massive leap' from Claude 3.5 Sonnet, posing significant security risks. OpenAI is also rumored to have a similarly advanced model, 'Spud,' in development.

Mar 27, 202695% relevant

Claude 'Mythos' Leak Suggests New Tier Beyond Opus 4.6, Targeting Cybersecurity Partners First

A leak from a reportedly reliable source claims Anthropic is developing 'Claude Mythos,' a new tier beyond Opus 4.6 with major gains in coding, reasoning, and cybersecurity. The model is described as so compute-intensive that initial access will be limited to select cybersecurity partners.

Mar 27, 202699% relevant

Frontier AI Models Reportedly Score Below 1% on ARC-AGI v3 Benchmark

A social media post claims frontier AI models have achieved below 1% performance on the ARC-AGI v3 benchmark, suggesting a potential saturation point for current scaling approaches. No specific models or scores were disclosed.

Mar 25, 202687% relevant

Professors at NYU, Stanford, and Case Western Reportedly Using NotebookLM to Automate Course Creation

Professors at three major universities have reportedly stopped building courses manually and are using Google's NotebookLM AI to automate the process. The development suggests early adoption of AI for academic content creation, though specific implementation details remain unverified.

Mar 21, 202693% relevant

Apple WWDC 2026: Gemini Deeply Integrated into iOS

A tweet from @kimmonismus claims Apple's 2026 WWDC will be the most exciting yet, with the first deep integration of a useful AI model (Gemini) into iOS and a new Apple CEO.

Apr 26, 202677% relevant

Unnamed Python Rewrite Gains 47K+ GitHub Stars in 5 Hours, Breaks Platform Velocity Record

An unidentified Python rewrite project amassed over 47,000 GitHub stars in just five hours, a velocity faster than any previous project in the platform's history. The viral surge suggests a high-demand tool or library, though its exact nature and technical merits remain unverified.

Apr 1, 202685% relevant

Anthropic's Claude Allegedly Has Secret 'Benjamin Franklin Persuasion & Leverage Machine' Mode

A viral tweet claims Anthropic's Claude AI has a hidden mode designed for persuasion and leverage analysis. No official confirmation or technical details have been provided by the company.

Mar 28, 202691% relevant

Anthropic's Claude Reportedly Has 'Ikigai Career Mapper' Feature for Personalized Career Guidance

A viral social media post claims Anthropic's Claude AI has a hidden 'Ikigai Career Mapper' mode that analyzes skills, passions, and income needs to suggest career paths. No official confirmation or technical details have been provided by Anthropic.

Mar 27, 202691% relevant

Xiaomi Open-Sources 38B Robotics-U0 Unifying Four Embodied Tasks

Xiaomi open-sourced 38B-parameter Robotics-U0, unifying four embodied tasks in a single model. No benchmarks or training data disclosed yet.

Jul 15, 2026100% relevant

Google Launches $0.034 Image Model, Video API for Gemini

Google launched Nano Banana 2 Lite ($0.034/image, 4-second generation) and Gemini Omni Flash ($0.10/second video API), targeting high-throughput developer pipelines.

Jun 30, 202687% relevant

MiniMax M3 Exceeds Human Gold-Medal on Math Benchmarks via MaxProof

MiniMax's M3 exceeded human gold-medal on math benchmarks via MaxProof, but no scores or details were disclosed.

Jun 12, 2026100% relevant

GitHub Copilot App Revealed via Leaked Screenshot

Leaked screenshot reveals GitHub Copilot mobile app, suggesting Microsoft expands AI coding to phones. No official confirmation or release date.

Jun 2, 2026100% relevant

No Rigorous Productivity Tests Exist for Post-2025 Autonomous Coding Tools

No productivity studies exist for autonomous coding tools launched December 2025. All research predates the Claude Code/Codex revolution, creating a major knowledge gap.

May 26, 202672% relevant

Explore More

AI Agents Large Language Models Claude Code OpenAI RAG MCP Fine-tuning Benchmarks Open Source AI AI Safety