⚠️ partialsaid 5200%
Google will ship a Gemini browser agent in Chrome
Auto-verified (confidence=78%, corroboration=72%, threshold=60%, web_search=yes): There is credible web evidence that Google publicly introduced a more agentic Gemini product at I/O 2026, especially Gemini Spark, which is described as helping users do real work and run errands rather than merely summarize pages. However, the prediction specifically calls for a Chrome-integrated Gemini browser agent that can complete multi-step web tasks, and the available evidence does not clearly confirm Chrome integration or browser-native form-filling/navigation workflows. So the general direction is right, but the specific product form and execution-surface claim are not fully verified. [Evidence FOR (4): [W2] SiliconANGLE reports Google introduced “Gemini Spark,” described as a 24/7 personal AI assistant that can help people navigate their digital lives and do real work, suggesting an agentic productivity feature rather than a passive summarizer.; [W5] Digital Trends says Gemini Spark can “run your errands” and take actions on the user’s behalf, which is directionally consistent with browser/task execution.; [W3] Wired’s Google I/O 2026 roundup says Google showed off a swath of new agentic AI features, indicating public shipping/announcement of agent-like capabilities. | Evidence AGAINST (3): [DB-4] The Decoder article says Gemini-SQL2 hits benchmarks but has “no public release or paper yet,” which shows Google has model work but not necessarily a publicly shipped browser agent.; [DB-14] A tweet claims Gemini 3.5 Live Translate debuted, but it is about translation, not a Chrome-integrated browser agent that performs multi-step web tasks.]
resolved Jun 17
⚠️ partialsaid 5800%
Google and OpenAI will both follow Anthropic with product announcements inside 5 days
Auto-verified (confidence=78%, corroboration=68%, threshold=60%, web_search=yes): The evidence clearly supports that Google published product announcements within the 5-day window after Anthropic's launch: DB-0 and DB-1 both describe Google launches/feature releases, and W2/W3 corroborate Google I/O announcements. However, there is no comparable evidence in the provided sources that OpenAI also published a product announcement, demo, or feature release in that same window. Because the prediction requires both Google and OpenAI to act, the outcome is only partially satisfied. [Evidence FOR (4): [DB-0] Google Gemma 4 12B: Encoder-Free Multimodal Model Launches — indicates Google published a product launch within the 5-day window.; [DB-1] Google LEAP Scaffold Lifts Lean-IMO-Bench One-Shot Solve Rate from <10% to 70% — indicates another Google product/feature announcement within the 5-day window.; [W2] Everything we saw at Google I/O: Gemini 3.5, Android XR glasses, Spark, and more — suggests Google made multiple product announcements at Google I/O within the relevant period. | Evidence AGAINST (4): [W0] Anthropic Buys The SDK Pipeline OpenAI And Gemini Depend On — relevant context about Anthropic's competitive position, but it does not confirm OpenAI launched anything within 5 days.; [W1] Claude Platform On AWS Rewrites The Hyperscaler AI Bargain — confirms Anthropic/Claude launch timing, but does not itself verify OpenAI's response.]
resolved Jun 3
❌ incorrectsaid 6380%
OpenAI will split Codex pricing from ChatGPT
Auto-verified (confidence=85%, corroboration=25%, threshold=75%, web_search=yes): The prediction specifically requires OpenAI to introduce a 'separate Codex pricing, billing, seat, or usage tier.' While evidence shows Codex is being actively developed with new features (Locked Use, Ultra-Fast mode, workspace agents), none of the sources confirm a new billing surface or distinct pricing tier. The deadline is imminent (within the next month), and a pricing/packaging change of this nature would likely be publicly announced. The absence of any such announcement, combined with evidence of Codex being integrated into the existing ChatGPT mobile app and workspace agents (suggesting bundling, not separation), directly contradicts the prediction's core requirement. [Evidence FOR (5): [DB-1] Codex 'Locked Use' feature spotted on macOS, suggesting a differentiated mode for coding workflows.; [DB-3] Codex lands in ChatGPT mobile app, expanding its availability as a distinct tool within the ChatGPT ecosystem.; [DB-9] Codex 'Ultra-Fast' mode spotted in leaked screenshot, indicating specialized tier development for Codex. | Evidence AGAINST (3): [DB-2] Ollama now runs Codex locally, offering a free alternative that challenges OpenAI's API model, but does not speak to OpenAI's own pricing/packaging.; [W1] OpenAI's ChatGPT ads shifted to cost-per-click, a pricing change for the advertising product, not a Codex-specific billing surface.]
resolved May 17
✅ correctsaid 6450%
Anthropic splits Claude Code billing from Claude AI
Auto-verified (confidence=85%, corroboration=70%, threshold=85%): The prediction anticipated that Anthropic would make Claude Code materially distinct from Claude AI in pricing/billing, pushing heavy coding users into a different commercial bucket. DB-5 confirms a structural split where programmatic/CLI usage now has separate monthly credits ($20-$200/mo) distinct from general Claude AI access. DB-6 corroborates this with reports of enforced programmatic API tiers and 10x cost hikes specifically targeting Claude Code power users. The substance of the prediction—a distinct billing/packaging layer separating coding users from general users—is met by these two credible, recent sources. [Evidence FOR (2): [DB-5] Anthropic splits `claude --print` and Agent SDK usage into separate monthly credits. Pro gets $20/mo, Max gets $100-$200/mo. Credits don't roll over.; [DB-6] Claude Code enforces programmatic API tiers, with users reporting 10x cost hikes to $1,000/month, squeezing power users toward API pricing.]
resolved May 14
✅ correctsaid 6800%
Claude Agent will add GitHub repository integration within 4 weeks
Auto-verified (confidence=85%, corroboration=72%, threshold=75%, web_search=yes): The prediction that Anthropic will release native GitHub integration for Claude Agent is substantively correct. Anthropic's official platform documentation ([W6]) explicitly describes connecting agents to GitHub for cloning, reading, and creating pull requests. The official 'claude-code-action' GitHub repository ([W7]) provides PR analysis, code implementation, and issue access. While no formal blog post was found, the verification criteria allow for 'developer documentation,' which these official sources fulfill. The launch of Claude Managed Agents ([W1]) provides the service infrastructure. The prediction's core claims—repository access, PR automation, and codebase analysis—are all confirmed by primary Anthropic sources. [Evidence FOR (4): [W6] Anthropic's platform documentation at platform.claude.com shows a dedicated page for 'Accessing GitHub' under Managed Agents, confirming that agents can 'Connect your agent to GitHub repositories for cloning, reading, and creating pull requests' and 'mount a GitHub repository to your session container and connect to the GitHub MCP for making pull re...'; [W7] Anthropic's official GitHub repository features 'claude-code-action', an interactive code assistant that 'Analyzes PR changes and suggests improvements', 'Can implement code changes and create commits/PRs', and 'Accesses GitHub issues, PRs, and code context', directly fulfilling the PR automation and codebase analysis criteria.; [W1] SiliconAngle reports on April 8, 2026 that 'Anthropic launches Claude Managed Agents to speed up AI agent development', a cloud service that likely underpins the GitHub integration. | Evidence AGAINST (3): No evidence found of an official Anthropic blog post specifically announcing a native GitHub integration for Claude Agent, though the verification criteria allow for 'developer documentation' which [W6] fulfills.; [W4] The Verge reports on a Claude Code source code leak showing unreleased features, but none of the leaked features described include a native GitHub integration; this absence is weak evidence against.]
resolved Apr 26
⏱ expiredsaid 7200%
Anthropic will launch a regulated-enterprise layer around Claude Code
semantic dup of: Anthropic's Claude Code revenue mix shifts toward enterprise
resolved Apr 24