Skip to content
gentic.news — AI News Intelligence Platform
Connecting to the Living Graph…
Predictions Lab

Forecasts and trend signals from the gentic.news knowledge graph.

How to read this page

Every prediction was written by the brain after reading the news, scored 0–100 for confidence, and resolves automatically against future evidence. Sort by confidence to see strongest signals; switch to Resolved to grade past calls.

Predictive Intelligence

AI-generated predictions backed by knowledge graph analysis of 89+ news sources. Each prediction cites specific entities, relationships, and trend signals — then gets automatically verified against real outcomes.

Share:
Calibrated · Falsifiable · Auto-verified

245 predictions made.

Each one is a falsifiable claim with a deadline and a confidence score. We watch the news, log the outcome, and report calibration honestly — including when we’re wrong.

Resolved
13%
32 of 245
Pending
85
open forecasts
Calibrated accuracy
79.7%
partial credit incl.
80%
Calibrated
Calibration curve

Are we as confident as we should be?

X = stated confidence. Y = how often we were right. The diagonal is perfect calibration.

n = 65
0%0%25%25%50%50%75%75%100%100%Stated confidenceActually correct
Lab Perfect calibration○ point size = sample count
Active predictions

Open forecasts, sorted by calibrated confidence

20 open

OpenAI will announce free ChatGPT Codex for .edu email domains within 90 days

8200%

By September 2026, OpenAI will announce that ChatGPT Codex (the merged coding capability from June 2) is available for free to all students and faculty with .edu email addresses, directly targeting the MIT/Stanford pipeline that Claude Code has captured. This will be framed as 'democratizing AI for education' but is a defensive response to Anthropic's academic talent acquisition strategy.

2mo16 evidence

Google Cloud + Hugging Face native TPU deployment integration by Q3 2026

8200%

Google Cloud will announce at Google Cloud Next '26 (expected September 2026) that Hugging Face Spaces and Kernels are natively integrated into Vertex AI, enabling one-click deployment of any Hugging Face model onto Google TPU v6 pods. This will be positioned as 'the fastest path from arXiv to production' and will include a revenue-sharing agreement where Hugging Face gets 15% of compute spend generated through its platform.

2mo16 evidence

FlashMemory-style sparse attention becomes the default for long-context models within 6 months

8000%

DeepSeek-V4's FlashMemory achieves 500K context with 90% less KV cache without retraining — a deployment advantage too large to ignore. Expect Anthropic, Google, and Meta to adopt similar lookahead sparse attention techniques within 2 quarters, making 500K+ context the new standard.

2mo0 evidence

Anthropic formalizes an education-to-employment pipeline

7900%

Anthropic formalizes an education-to-employment pipeline. Graph evidence: High-degree/bridge centrality for Claude Code; structural holes to MIT and Stanford; active prediction already supported by repeated mentions and ecosystem clustering around Claude Code.

2mo0 evidence

OpenAI will unbundle Codex from ChatGPT as a standalone product within 60 days

7800%

By August 15, 2026, OpenAI will announce 'OpenAI Codex Pro' — a standalone developer product priced at $200/month with CLI-native features, competing directly with Claude Code. The current ChatGPT+Codex integration will be deprecated for new developer users.

28d16 evidence

Anthropic will formalize an education-to-employment pipeline within two quarters

7800%

Anthropic will formalize an education-to-employment pipeline within two quarters. Graph evidence: Claude Code degree=182, bridge=0.9; MIT/Stanford appear in latent talent-pipeline narratives; no direct institutional edges yet despite repeated co-occurrence.

2mo0 evidence

MCP security certification becomes a prerequisite for enterprise AI procurement

7800%

By November 2026, at least one of the Big 4 accounting firms (Deloitte, PwC, EY, KPMG) will launch an 'MCP Security Audit & Certification' practice, and a major cloud provider (AWS or GCP) will require MCP servers deployed on their infrastructure to pass a minimum security score (>60/100) to be listed in their managed AI agent marketplace.

2mo13 evidence

Claude Code becomes the default enterprise agent shell across multiple frontier-model providers

7800%

Claude Code becomes the default enterprise agent shell across multiple frontier-model providers. Graph evidence: Highest degree=176, bridge=0.6; strong co-occurrence with Anthropic, Google, and OpenAI; Claude Code ecosystem is the largest named community cluster.

2mo0 evidence
Predict with the lab

Will this happen? Cast your vote.

Your vote stays in your browser. We compare crowd intuition against the lab’s calibrated forecast.

Lab confidence: 8200%
Resolves in 2mo
OpenAI will announce free ChatGPT Codex for .edu email domains within 90 days

By September 2026, OpenAI will announce that ChatGPT Codex (the merged coding capability from June 2) is available for free to all students and faculty with .edu email addresses, directly targeting the MIT/Stanford pipeline that Claude Code has captured. This will be framed as 'democratizing AI for education' but is a defensive response to Anthropic's academic talent acquisition strategy.

Pick 1 of 6
Trending signals

What’s shifting in the graph

Top movers from 7-day mention velocity.

  • 1.benchmarks
    9 mentions · 7d
    200%
  • 2.infrastructure
    3 mentions · 7d
    200%
  • 3.open source
    4 mentions · 7d
    100%
  • 4.business
    3 mentions · 7d
    100%
  • 5.openai
    5 mentions · 7d
    0%
Recently resolved

What we predicted vs what happened

last 6
⚠️ partialsaid 5200%

Google will ship a Gemini browser agent in Chrome

Auto-verified (confidence=78%, corroboration=72%, threshold=60%, web_search=yes): There is credible web evidence that Google publicly introduced a more agentic Gemini product at I/O 2026, especially Gemini Spark, which is described as helping users do real work and run errands rather than merely summarize pages. However, the prediction specifically calls for a Chrome-integrated Gemini browser agent that can complete multi-step web tasks, and the available evidence does not clearly confirm Chrome integration or browser-native form-filling/navigation workflows. So the general direction is right, but the specific product form and execution-surface claim are not fully verified. [Evidence FOR (4): [W2] SiliconANGLE reports Google introduced “Gemini Spark,” described as a 24/7 personal AI assistant that can help people navigate their digital lives and do real work, suggesting an agentic productivity feature rather than a passive summarizer.; [W5] Digital Trends says Gemini Spark can “run your errands” and take actions on the user’s behalf, which is directionally consistent with browser/task execution.; [W3] Wired’s Google I/O 2026 roundup says Google showed off a swath of new agentic AI features, indicating public shipping/announcement of agent-like capabilities. | Evidence AGAINST (3): [DB-4] The Decoder article says Gemini-SQL2 hits benchmarks but has “no public release or paper yet,” which shows Google has model work but not necessarily a publicly shipped browser agent.; [DB-14] A tweet claims Gemini 3.5 Live Translate debuted, but it is about translation, not a Chrome-integrated browser agent that performs multi-step web tasks.]

resolved Jun 17
⚠️ partialsaid 5800%

Google and OpenAI will both follow Anthropic with product announcements inside 5 days

Auto-verified (confidence=78%, corroboration=68%, threshold=60%, web_search=yes): The evidence clearly supports that Google published product announcements within the 5-day window after Anthropic's launch: DB-0 and DB-1 both describe Google launches/feature releases, and W2/W3 corroborate Google I/O announcements. However, there is no comparable evidence in the provided sources that OpenAI also published a product announcement, demo, or feature release in that same window. Because the prediction requires both Google and OpenAI to act, the outcome is only partially satisfied. [Evidence FOR (4): [DB-0] Google Gemma 4 12B: Encoder-Free Multimodal Model Launches — indicates Google published a product launch within the 5-day window.; [DB-1] Google LEAP Scaffold Lifts Lean-IMO-Bench One-Shot Solve Rate from <10% to 70% — indicates another Google product/feature announcement within the 5-day window.; [W2] Everything we saw at Google I/O: Gemini 3.5, Android XR glasses, Spark, and more — suggests Google made multiple product announcements at Google I/O within the relevant period. | Evidence AGAINST (4): [W0] Anthropic Buys The SDK Pipeline OpenAI And Gemini Depend On — relevant context about Anthropic's competitive position, but it does not confirm OpenAI launched anything within 5 days.; [W1] Claude Platform On AWS Rewrites The Hyperscaler AI Bargain — confirms Anthropic/Claude launch timing, but does not itself verify OpenAI's response.]

resolved Jun 3
❌ incorrectsaid 6380%

OpenAI will split Codex pricing from ChatGPT

Auto-verified (confidence=85%, corroboration=25%, threshold=75%, web_search=yes): The prediction specifically requires OpenAI to introduce a 'separate Codex pricing, billing, seat, or usage tier.' While evidence shows Codex is being actively developed with new features (Locked Use, Ultra-Fast mode, workspace agents), none of the sources confirm a new billing surface or distinct pricing tier. The deadline is imminent (within the next month), and a pricing/packaging change of this nature would likely be publicly announced. The absence of any such announcement, combined with evidence of Codex being integrated into the existing ChatGPT mobile app and workspace agents (suggesting bundling, not separation), directly contradicts the prediction's core requirement. [Evidence FOR (5): [DB-1] Codex 'Locked Use' feature spotted on macOS, suggesting a differentiated mode for coding workflows.; [DB-3] Codex lands in ChatGPT mobile app, expanding its availability as a distinct tool within the ChatGPT ecosystem.; [DB-9] Codex 'Ultra-Fast' mode spotted in leaked screenshot, indicating specialized tier development for Codex. | Evidence AGAINST (3): [DB-2] Ollama now runs Codex locally, offering a free alternative that challenges OpenAI's API model, but does not speak to OpenAI's own pricing/packaging.; [W1] OpenAI's ChatGPT ads shifted to cost-per-click, a pricing change for the advertising product, not a Codex-specific billing surface.]

resolved May 17
✅ correctsaid 6450%

Anthropic splits Claude Code billing from Claude AI

Auto-verified (confidence=85%, corroboration=70%, threshold=85%): The prediction anticipated that Anthropic would make Claude Code materially distinct from Claude AI in pricing/billing, pushing heavy coding users into a different commercial bucket. DB-5 confirms a structural split where programmatic/CLI usage now has separate monthly credits ($20-$200/mo) distinct from general Claude AI access. DB-6 corroborates this with reports of enforced programmatic API tiers and 10x cost hikes specifically targeting Claude Code power users. The substance of the prediction—a distinct billing/packaging layer separating coding users from general users—is met by these two credible, recent sources. [Evidence FOR (2): [DB-5] Anthropic splits `claude --print` and Agent SDK usage into separate monthly credits. Pro gets $20/mo, Max gets $100-$200/mo. Credits don't roll over.; [DB-6] Claude Code enforces programmatic API tiers, with users reporting 10x cost hikes to $1,000/month, squeezing power users toward API pricing.]

resolved May 14
✅ correctsaid 6800%

Claude Agent will add GitHub repository integration within 4 weeks

Auto-verified (confidence=85%, corroboration=72%, threshold=75%, web_search=yes): The prediction that Anthropic will release native GitHub integration for Claude Agent is substantively correct. Anthropic's official platform documentation ([W6]) explicitly describes connecting agents to GitHub for cloning, reading, and creating pull requests. The official 'claude-code-action' GitHub repository ([W7]) provides PR analysis, code implementation, and issue access. While no formal blog post was found, the verification criteria allow for 'developer documentation,' which these official sources fulfill. The launch of Claude Managed Agents ([W1]) provides the service infrastructure. The prediction's core claims—repository access, PR automation, and codebase analysis—are all confirmed by primary Anthropic sources. [Evidence FOR (4): [W6] Anthropic's platform documentation at platform.claude.com shows a dedicated page for 'Accessing GitHub' under Managed Agents, confirming that agents can 'Connect your agent to GitHub repositories for cloning, reading, and creating pull requests' and 'mount a GitHub repository to your session container and connect to the GitHub MCP for making pull re...'; [W7] Anthropic's official GitHub repository features 'claude-code-action', an interactive code assistant that 'Analyzes PR changes and suggests improvements', 'Can implement code changes and create commits/PRs', and 'Accesses GitHub issues, PRs, and code context', directly fulfilling the PR automation and codebase analysis criteria.; [W1] SiliconAngle reports on April 8, 2026 that 'Anthropic launches Claude Managed Agents to speed up AI agent development', a cloud service that likely underpins the GitHub integration. | Evidence AGAINST (3): No evidence found of an official Anthropic blog post specifically announcing a native GitHub integration for Claude Agent, though the verification criteria allow for 'developer documentation' which [W6] fulfills.; [W4] The Verge reports on a Claude Code source code leak showing unreleased features, but none of the leaked features described include a native GitHub integration; this absence is weak evidence against.]

resolved Apr 26
⏱ expiredsaid 7200%

Anthropic will launch a regulated-enterprise layer around Claude Code

semantic dup of: Anthropic's Claude Code revenue mix shifts toward enterprise

resolved Apr 24

Predictor Leaderboard

Top 30 anonymous voters · ranked by accuracy on resolved predictions

85
Active
22
Correct
3
Incorrect
33
Expired
79.7%
Accuracy (n=32)
72.1%
Avg Confidence
Methodology & Accuracy Tracking

How predictions are made

Predictions are generated by analyzing trend signals across 42+ AI news sources, enriched with knowledge graph relationships between entities (companies, people, technologies). Each prediction includes a confidence score and target date.

How accuracy is computed

Accuracy = (correct + partial × 0.5) ÷ total evaluated. All resolved predictions count — including expired ones (treated as failures). Sample size is shown next to the accuracy figure.

Verification process

Past-deadline predictions are verified via 3-layer evidence: entity-linked articles, keyword search, and web search. An AI judge evaluates evidence for and against, requiring high confidence thresholds before resolving.

Possible outcomes

  • Correct — prediction confirmed by evidence
  • Partially Correct — core thesis confirmed with caveats
  • Incorrect — contradicted by evidence
  • Expired — deadline passed, insufficient evidence

Trending Signals

openai0%benchmarks+200%claude code0%deepseek-v30%open source+100%business+100%nvidia0%anthropic0%agentic coding0%infrastructure+200%

Active Predictions(20)

NEWEventbig techKnowledge Graph
3mo left6h ago

OpenAI cuts coding API prices before Codex gets split out

Within the next quarter, OpenAI will reduce effective pricing for at least one coding-relevant API tier by 20%+ or add materially higher usage limits. The move will be framed as developer-friendly, but the real signal is that OpenAI is trying to defend usage share before Codex becomes a more distinct standalone business line.

ConfidenceTarget: Sep 15, 2026
55%Possible
View reasoning & evidence
Reasoning: The graph shows OpenAI is already under pressure on multiple fronts: ChatGPT market share dipped below 50% for the first time, OpenAI is competing directly with Claude Code, Cursor, and DeepSeek, and there is a recent headline that OpenAI is considering steep API price cuts. That combination usually precedes a pricing response rather than a product-only response, especially when coding workloads are the battleground. If pricing stays unchanged through the quarter, or if OpenAI instead only changes packaging without any effective cost relief, this call is wrong.
How we verify: OpenAI publicly lowers effective pricing or raises usage limits for at least one coding-relevant API tier by 20% or more.
OpenAI
Relationships:OpenAI competes_with GoogleOpenAI developed GPT-5.3Microsoft partnered OpenAIOpenAI competes_with CursorGoogle competes_with OpenAIOpenAI hired Sam AltmanOpenAI developed ChatGPTOpenAI developed GPT-5.2 Pro
Events:OpenAI: OpenAI considering steep API price cuts. (2026-06-16)OpenAI: OpenAI researchers publish Deployment Simulation method predicting GPT-5 errors with 92% accuracy (2026-06-17)ChatGPT market share dips below 50% (2026-06-16)OpenAI considering steep API price cuts (2026-06-16)
Sentiment:Sentiment toward Cursor: +0.32Sentiment toward OpenAI: mixed
Momentum:OpenAI: 21 mentions [velocity: 1.1x]
Patterns:convergencecompetitive_shiftprecursor
Predict with the Lab
Resolves in
89d 01h 57m 25s
Claim: OpenAI cuts coding API prices before Codex gets split out
Lab thinks
55%
Δ Lab vs Crowd
Crowd thinks
Lab confidence55%
Crowd confidence
NEWEventproductBasic Analysis
3mo left9h ago

Google responds with a formal student-facing Claude Code competitor or education bundle

Google responds with a formal student-facing Claude Code competitor or education bundle. Graph evidence: Google has the highest degree, but the Claude Code cluster is absorbing developer and academic adjacency; the graph shows a latent competitive gap in education-linked tooling.

ConfidenceTarget: Sep 15, 2026
71%Likely
View reasoning & evidence
Reasoning: Google’s high-degree position and repeated temporal motif with product launches imply it reacts quickly when a rival captures a workflow cluster. The MIT/Stanford hole is especially threatening because it attacks Google’s historical talent moat.
How we verify: Google responds with a formal student-facing Claude Code competitor or education bundle
Predict with the Lab
Resolves in
89d 01h 57m 25s
Claim: Google responds with a formal student-facing Claude Code competitor or education bundle
Lab thinks
71%
Δ Lab vs Crowd
Crowd thinks
Lab confidence71%
Crowd confidence
NEWEventfundingBasic Analysis
3mo left9h ago

Anthropic formalizes an education-to-employment pipeline

Anthropic formalizes an education-to-employment pipeline. Graph evidence: High-degree/bridge centrality for Claude Code; structural holes to MIT and Stanford; active prediction already supported by repeated mentions and ecosystem clustering around Claude Code.

ConfidenceTarget: Sep 15, 2026
79%Likely
View reasoning & evidence
Reasoning: Claude Code’s viral adoption at MIT and Stanford creates a durable talent funnel. Because the product already sits at the center of the developer workflow graph, Anthropic can convert usage into recruiting before Google can reassert its academic pipeline advantage.
How we verify: Anthropic formalizes an education-to-employment pipeline
Predict with the Lab
Resolves in
89d 01h 57m 25s
Claim: Anthropic formalizes an education-to-employment pipeline
Lab thinks
79%
Δ Lab vs Crowd
Crowd thinks
Lab confidence79%
Crowd confidence
NEWEventresearchBasic Analysis
3mo left16h ago

Active visual reasoning becomes a standard component in multimodal agent architectures within 2 quarters

Visual-Seeker's SOTA results on five benchmarks without search-specific training will trigger a wave of integration — expect at least 2 major labs (likely Google DeepMind and OpenAI) to announce active evidence gathering as a core reasoning component in their next multimodal agent release, shifting the paradigm from passive chain-of-thought to iterative visual search.

ConfidenceTarget: Sep 15, 2026
75%Likely
View reasoning & evidence
Reasoning: [Research Analysis] Visual-Seeker's SOTA results on five benchmarks without search-specific training will trigger a wave of integration — expect at least 2 major labs (likely Google DeepMind and OpenAI) to announce active evidence gathering as a core reasoning component in their next multimodal agent release, shifting the paradigm from passive chain-of-thought to iterative visual search.
How we verify: Monitor for: (1) ArXiv preprints or blog posts from DeepMind/OpenAI describing active perception modules, (2) Benchmark results on MA-ProofBench or similar reasoning tasks showing >20% improvement from active search, (3) Conference papers (NeurIPS 2026, ICML 2027) with 'active visual reasoning' as a keyword.
Predict with the Lab
Resolves in
89d 01h 57m 25s
Claim: Active visual reasoning becomes a standard component in multimodal agent architectures within 2 quarters
Lab thinks
75%
Δ Lab vs Crowd
Crowd thinks
Lab confidence75%
Crowd confidence
NEWEventproductKnowledge Graph
4w left22h ago

OpenAI will unbundle Codex from ChatGPT as a standalone product within 60 days

By August 15, 2026, OpenAI will announce 'OpenAI Codex Pro' — a standalone developer product priced at $200/month with CLI-native features, competing directly with Claude Code. The current ChatGPT+Codex integration will be deprecated for new developer users.

ConfidenceTarget: Jul 16, 2026
78%Likely
View reasoning & evidence
Reasoning: [Agent Investigation] ChatGPT is in a defensive erosion phase — market share has fallen from 77% to 46.4% in ~12 months, and the trajectory is accelerating downward. The product is being squeezed from two directions: Claude Code is absorbing the high-value developer/student segment (as evidenced by the viral academic pipeline to Anthropic), while Gemini and Perplexity fragment the consumer market. The 'dreaming memory' launch feels like a feature parity move, not a strategic counter. OpenAI's merger of Codex into ChatGPT is a reactive play that risks diluting both products — it neither wins back developers (who want terminal-native tools like Claude Code) nor simplifies the consumer experience. | ChatGPT will continue losing market share for at least 2 more quarters, bottoming around 35-40%. The key inflection point is whether OpenAI can successfully spin Codex back out as a standalone product (contradicting the merger) or whether they accept ChatGPT as a mass-market consumer product while ceding the power-user/developer segment to Anthropic. The data pattern suggests OpenAI will attempt a 'two product' strategy within 3 months: a consumer ChatGPT and a separate developer platform. | [PRE-MORTEM] This prediction is DISPROVEN if: (1) By August 15, OpenAI has NOT announced any standalone developer product; (2) OpenAI instead releases free Codex access for .edu domains within ChatGPT (which would be the opposite strategy — doubling down on integration); (3) An OpenAI executive publicly states they have no plans to separate Codex from ChatGPT. The strongest counter-evidence would be a major feature release that deepens Codex integration into ChatGPT (e.g., native file system access, persistent terminals within ChatGPT).
How we verify: Official OpenAI blog post or press release announcing a standalone Codex developer product, OR a public deprecation notice for Codex features within ChatGPT with a migration path to a separate product.
ChatGPT
Relationships:ChatGPT → uses → NvidiaChatGPT → competes_with → Claude CodeChatGPT → competes_with → Claude AIChatGPT → competes_with → GeminiChatGPT → competes_with → AnthropicOpenAI → developed → ChatGPTGoogle → uses → ChatGPTMeta → uses → ChatGPTClaude Agent → uses → ChatGPTClaude Code → competes_with → ChatGPT
Events:[2026-06-05] product_launch: OpenAI launched dreaming memory system for ChatGPT that retains user preferences across sessions[2026-06-02] product_launch: OpenAI merged Codex into ChatGPT[2026-05-16] policy: ChatGPT market share fell below 50% for first time to 46.4%.[2026-04-16] product_launch: Market share decline from 77% to 57% over past twelve months[2026-04-14] research_milestone: Research experiment using simulated teen persona 'Bridget' to probe AI mental health interaction risks
Sentiment:ChatGPT 2026-05-18: -0.05 (2 mentions)ChatGPT 2026-06-01: +0.30 (3 mentions)ChatGPT 2026-06-08: +0.20 (2 mentions)ChatGPT 2026-06-15: -0.20 (1 mentions)
Momentum:ChatGPT (product): 131 mentions
Predict with the Lab
Resolves in
28d 01h 57m 25s
Claim: OpenAI will unbundle Codex from ChatGPT as a standalone product within 60 days
Lab thinks
78%
Δ Lab vs Crowd
Crowd thinks
Lab confidence78%
Crowd confidence
EventresearchBasic Analysis
4w left2d ago

Diffusion LLMs will power the first production on-device AI coding assistant within 2 quarters

llada.cpp's 17-42x latency reduction on mobile NPU makes dLLMs viable for real-time coding assistance on-device. Combined with SWE-Explore's finding that structural understanding is the bottleneck, a diffusion-based agent optimized for code structure will emerge as the first credible on-device coding assistant.

ConfidenceTarget: Jul 15, 2026
65%Possible
View reasoning & evidence
Reasoning: [Research Analysis] llada.cpp's 17-42x latency reduction on mobile NPU makes dLLMs viable for real-time coding assistance on-device. Combined with SWE-Explore's finding that structural understanding is the bottleneck, a diffusion-based agent optimized for code structure will emerge as the first credible on-device coding assistant.
How we verify: A diffusion LLM-based coding agent (not autoregressive) deployed on mobile/edge with >30% critical line coverage on SWE-Explore.
Predict with the Lab
Resolves in
27d 01h 57m 25s
Claim: Diffusion LLMs will power the first production on-device AI coding assistant within 2 quarters
Lab thinks
65%
Δ Lab vs Crowd
Crowd thinks
Lab confidence65%
Crowd confidence
Eventbig techBasic Analysis
2mo left3d ago

Anthropic will formalize an education-to-employment pipeline within two quarters

Anthropic will formalize an education-to-employment pipeline within two quarters. Graph evidence: Claude Code degree=182, bridge=0.9; MIT/Stanford appear in latent talent-pipeline narratives; no direct institutional edges yet despite repeated co-occurrence.

ConfidenceTarget: Sep 12, 2026
78%Likely
View reasoning & evidence
Reasoning: Claude Code's bridge position plus repeated MIT/Stanford talent motifs indicate product-led adoption is already functioning as a recruiting funnel. The graph shows a strategic gap between elite universities and Anthropic that is likely to close via internships, campus programs, or explicit student offers.
How we verify: Anthropic will formalize an education-to-employment pipeline within two quarters
Predict with the Lab
Resolves in
86d 01h 57m 25s
Claim: Anthropic will formalize an education-to-employment pipeline within two quarters
Lab thinks
78%
Δ Lab vs Crowd
Crowd thinks
Lab confidence78%
Crowd confidence
EventproductKnowledge Graph
2mo left3d ago

OpenAI will announce free ChatGPT Codex for .edu email domains within 90 days

By September 2026, OpenAI will announce that ChatGPT Codex (the merged coding capability from June 2) is available for free to all students and faculty with .edu email addresses, directly targeting the MIT/Stanford pipeline that Claude Code has captured. This will be framed as 'democratizing AI for education' but is a defensive response to Anthropic's academic talent acquisition strategy.

ConfidenceTarget: Sep 12, 2026
82%Likely
View reasoning & evidence
Reasoning: [Agent Investigation] ChatGPT is in a defensive position despite being the market leader. Market share decline from 77% to 57% over 12 months signals structural erosion, not a blip. The Codex merger into ChatGPT (June 2) is a reactive move to counter Claude Code's developer traction, but the sentiment trajectory shows a decelerating rebound — positive but losing momentum. The Visa partnership signal cluster (5 identical convergence alerts) suggests enterprise adoption is real but narrow. The real threat is not Gemini or Perplexity — it's Claude Code silently creating a talent pipeline from MIT/Stanford to Anthropic, which will compound into a developer ecosystem advantage within 2 quarters. | ChatGPT will bifurcate into a consumer product (memory system, agentic features) and a developer platform (Codex integration), but the developer side will lose share to Claude Code unless OpenAI makes a dramatic pricing or open-source move. The 57% market share floor is not stable — expect another 5-8 point drop in 2 quarters as Claude Code's academic talent pipeline produces superior coding products. The 'dreaming memory' feature signals OpenAI doubling down on consumer stickiness as a defensive moat, but this doesn't address the developer exodus. | [PRE-MORTEM] This prediction is DISPROVEN if: (1) OpenAI instead raises ChatGPT Plus pricing or removes features from free tier, (2) 90 days pass with no education-focused pricing announcement, (3) OpenAI announces a 'ChatGPT for Education' product that is paid (not free), (4) Anthropic announces a counter-deal with MIT/Stanford before OpenAI responds.
How we verify: Official OpenAI blog post or press release announcing free ChatGPT Codex for .edu domains, or institutional partnership announcement with MIT and/or Stanford CS departments.
ChatGPT
Relationships:ChatGPT → uses → NvidiaChatGPT → competes_with → Claude CodeChatGPT → competes_with → Claude AIChatGPT → competes_with → AnthropicChatGPT → competes_with → Claude AgentOpenAI → developed → ChatGPTGoogle → uses → ChatGPTMeta → uses → ChatGPTClaude Agent → uses → ChatGPTGemini → competes_with → ChatGPT
Events:[2026-06-05] product_launch: OpenAI launched dreaming memory system for ChatGPT that retains user preferences across sessions[2026-06-02] product_launch: OpenAI merged Codex into ChatGPT[2026-04-16] product_launch: Market share decline from 77% to 57% over past twelve months[2026-04-14] research_milestone: Research experiment using simulated teen persona 'Bridget' to probe AI mental health interaction risks[2026-04-06] product_launch: Being deployed in the beauty sector as 'Agentic AI' to transform customer discovery, trust-building, and conversion.
Sentiment:ChatGPT 2026-05-18: -0.05 (2 mentions)ChatGPT 2026-06-01: +0.30 (3 mentions)ChatGPT 2026-06-08: +0.20 (2 mentions)
Momentum:ChatGPT (product): 130 mentions
Predict with the Lab
Resolves in
86d 01h 57m 25s
Claim: OpenAI will announce free ChatGPT Codex for .edu email domains within 90 days
Lab thinks
82%
Δ Lab vs Crowd
Crowd thinks
Lab confidence82%
Crowd confidence
EventresearchBasic Analysis
2mo left3d ago

Open-weight MoE models will trigger a 'safety certification' race

Within 1 quarter, at least one frontier lab will announce a runtime safety certification framework for open-weight models, responding to the governance gap exposed by Chinese MoE models matching GPT-5.5. This will be led by Google or Anthropic, not OpenAI.

ConfidenceTarget: Sep 12, 2026
70%Likely
View reasoning & evidence
Reasoning: [Research Analysis] Within 1 quarter, at least one frontier lab will announce a runtime safety certification framework for open-weight models, responding to the governance gap exposed by Chinese MoE models matching GPT-5.5. This will be led by Google or Anthropic, not OpenAI.
How we verify: Watch for press releases or papers from Google DeepMind or Anthropic on runtime safety certification for open-weight models, specifically addressing MoE architectures.
Predict with the Lab
Resolves in
86d 01h 57m 25s
Claim: Open-weight MoE models will trigger a 'safety certification' race
Lab thinks
70%
Δ Lab vs Crowd
Crowd thinks
Lab confidence70%
Crowd confidence
Eventbig techBasic Analysis
3w left4d ago

OpenAI will keep acquiring agent-execution infrastructure rather than only model startups

OpenAI will keep acquiring agent-execution infrastructure rather than only model startups. Graph evidence: OpenAI has 210 degree, strong overlap with adjacent tool nodes, and the live acquisition signal aligns with a structural hole around agent infrastructure.

ConfidenceTarget: Jul 13, 2026
77%Likely
View reasoning & evidence
Reasoning: The Ona deal fits a structural pattern: OpenAI sits near many shared neighbors with Gemini and adjacent tool ecosystems, so the fastest way to defend the agent layer is to own the sandbox, orchestration, and execution substrate.
How we verify: OpenAI will keep acquiring agent-execution infrastructure rather than only model startups
Predict with the Lab
Resolves in
25d 01h 57m 25s
Claim: OpenAI will keep acquiring agent-execution infrastructure rather than only model startups
Lab thinks
77%
Δ Lab vs Crowd
Crowd thinks
Lab confidence77%
Crowd confidence
EventproductBasic Analysis
2mo left4d ago

Nvidia will use Blackwell Ultra NVL72 to force a refresh cycle that accelerates cloud capex commitments

Nvidia will use Blackwell Ultra NVL72 to force a refresh cycle that accelerates cloud capex commitments. Graph evidence: High degree (202), strong bridge score (0.7), and a new competitive edge from Blackwell Ultra NVL72 to Hopper H200 indicate an active architecture transition.

ConfidenceTarget: Sep 11, 2026
74%Likely
View reasoning & evidence
Reasoning: Nvidia's repeated temporal coupling with Anthropic and Google launches, plus the new Blackwell Ultra NVL72 -> Hopper H200 competition edge, suggests a deliberate replacement wave. The graph shows Nvidia as the infra node that monetizes frontier-model competition by making older hardware look obsolete.
How we verify: Nvidia will use Blackwell Ultra NVL72 to force a refresh cycle that accelerates cloud capex commitments
Predict with the Lab
Resolves in
85d 01h 57m 25s
Claim: Nvidia will use Blackwell Ultra NVL72 to force a refresh cycle that accelerates cloud capex commitments
Lab thinks
74%
Δ Lab vs Crowd
Crowd thinks
Lab confidence74%
Crowd confidence
EventresearchBasic Analysis
2mo left5d ago

KV Cache Quantization Safety Breach Will Trigger New Alignment Research

The finding that KV cache quantization silently breaks safety alignment will spark a wave of 'post-quantization alignment' methods, with at least one major lab shipping a PCR-like diagnostic within 3 months.

ConfidenceTarget: Sep 10, 2026
75%Likely
View reasoning & evidence
Reasoning: [Research Analysis] The finding that KV cache quantization silently breaks safety alignment will spark a wave of 'post-quantization alignment' methods, with at least one major lab shipping a PCR-like diagnostic within 3 months.
How we verify: A paper or blog post from a frontier lab (OpenAI, Google, Anthropic, Meta) proposing a method to restore safety after quantization, or a product announcement including such a diagnostic.
Predict with the Lab
Resolves in
84d 01h 57m 25s
Claim: KV Cache Quantization Safety Breach Will Trigger New Alignment Research
Lab thinks
75%
Δ Lab vs Crowd
Crowd thinks
Lab confidence75%
Crowd confidence
EventresearchBasic Analysis
2mo left6d ago

FlashMemory-style sparse attention becomes the default for long-context models within 6 months

DeepSeek-V4's FlashMemory achieves 500K context with 90% less KV cache without retraining — a deployment advantage too large to ignore. Expect Anthropic, Google, and Meta to adopt similar lookahead sparse attention techniques within 2 quarters, making 500K+ context the new standard.

ConfidenceTarget: Sep 9, 2026
80%Likely
View reasoning & evidence
Reasoning: [Research Analysis] DeepSeek-V4's FlashMemory achieves 500K context with 90% less KV cache without retraining — a deployment advantage too large to ignore. Expect Anthropic, Google, and Meta to adopt similar lookahead sparse attention techniques within 2 quarters, making 500K+ context the new standard.
How we verify: At least 2 of: Anthropic, Google, Meta release a model or paper using lookahead sparse attention similar to FlashMemory within 6 months.
Predict with the Lab
Resolves in
83d 01h 57m 25s
Claim: FlashMemory-style sparse attention becomes the default for long-context models within 6 months
Lab thinks
80%
Δ Lab vs Crowd
Crowd thinks
Lab confidence80%
Crowd confidence
EventpolicyKnowledge Graph
2mo left6d ago

MCP security certification becomes a prerequisite for enterprise AI procurement

By November 2026, at least one of the Big 4 accounting firms (Deloitte, PwC, EY, KPMG) will launch an 'MCP Security Audit & Certification' practice, and a major cloud provider (AWS or GCP) will require MCP servers deployed on their infrastructure to pass a minimum security score (>60/100) to be listed in their managed AI agent marketplace.

ConfidenceTarget: Sep 9, 2026
78%Likely
View reasoning & evidence
Reasoning: [Agent Investigation] MCP is transitioning from a promising open protocol to a critical infrastructure layer with a glaring security debt. With 9,400+ registered servers but 66% critically vulnerable, it's a classic platform growth vs. governance tension. Anthropic's strategic bet is that ubiquity wins over perfection, but the security audit data suggests this bet is creating an exploitable attack surface that competitors (e.g., OpenAI's function calling, Google's Agent-to-Agent protocol) can weaponize against MCP adoption in regulated industries. | The data shows a clear pattern: rapid server growth (9,400+) → security crisis (66% vulnerable) → market demand for trust layer. The sentiment inflection rising from +0.275 to +0.700 in 5 weeks despite the March security revelations suggests the market is pricing in a solution. Expect a formal MCP security certification program or a 'MCP Secure' tier within 2 quarters, likely as an Anthropic-led consortium with GitHub and VS Code as enforcement points. | [PRE-MORTEM] This prediction is false if: (1) No Big 4 firm announces an MCP security practice by November 2026; (2) AWS/GCP/Azure do not introduce any security scoring requirement for MCP servers in their marketplaces; (3) The market instead coalesces around a community-maintained 'MCP Safe' badge with no institutional backing.
How we verify: Binary check: Search for 'MCP security certification' or 'MCP audit' on the websites of Deloitte/PwC/EY/KPMG, and check AWS Marketplace or GCP Agent Builder for MCP server listing requirements mentioning a minimum security score.
Model Context Protocol
Relationships:Model Context Protocol → uses → Chrome DevToolsModel Context Protocol → uses → PlaywrightClaude Code → uses → Model Context ProtocolAnthropic → developed → Model Context ProtocolGitHub → uses → Model Context ProtocolVS Code AI Toolkit → uses → Model Context ProtocolDeep Research Max → uses → Model Context Protocol
Events:[2026-05-22] product_launch: QA Claude Skill open-sourced with 24 production-grade skills[2026-05-01] research_milestone: MCP crossed 9,400 registered servers[2026-04-17] product_launch: Anthropic introduced the Model Context Protocol (MCP), an open standard for AI agent tool integration.[2026-04-16] product_launch: Used as an open standard to enable AI agent access to system-level diagnostic tools for kernel trace analysis.[2026-04-01] regulatory_action: Security audit reveals 43% of MCP servers are vulnerable to command execution and 341 malicious skills found on marketplaces, exposing systemic flaws.
Sentiment:Model Context Protocol 2026-05-11: +0.28 (4 mentions)Model Context Protocol 2026-05-18: +0.43 (3 mentions)Model Context Protocol 2026-05-25: +0.40 (1 mentions)Model Context Protocol 2026-06-01: +0.35 (2 mentions)Model Context Protocol 2026-06-08: +0.70 (1 mentions)
Momentum:Model Context Protocol (technology): 130 mentions
Predict with the Lab
Resolves in
83d 01h 57m 25s
Claim: MCP security certification becomes a prerequisite for enterprise AI procurement
Lab thinks
78%
Δ Lab vs Crowd
Crowd thinks
Lab confidence78%
Crowd confidence
Eventbig techKnowledge Graph
2mo left1w ago

Apple will ship a second on-device AI feature in iOS 26

Apple will publicly ship at least one additional on-device AI feature in iOS 26 that is not just a Siri tweak — likely a system-level workflow in Photos, Messages, or Passwords that runs without server calls. The tell will be Apple leaning harder into local inference as a product differentiator, not just a privacy talking point.

ConfidenceTarget: Sep 8, 2026
55%Possible
View reasoning & evidence
Reasoning: Apple is surging at 7.0x velocity, and the graph now explicitly places it in competition with OpenAI, Google, Microsoft, and Nvidia. Recent headlines already show Apple pushing on-device execution with Core AI, Siri visual intelligence, and Passwords auto-change, which means the next step is not "more AI" but a second, distinct local workflow that proves the platform strategy. If Apple instead pauses after the current wave of announcements, this call fails; the invalidation signal would be no new on-device AI surface shipping in iOS 26.
How we verify: A new iOS 26 feature ships that performs a meaningful AI workflow fully on-device, confirmed by public release notes, app behavior, or credible hands-on reports.
Apple
Relationships:Apple competes_with MicrosoftApple competes_with OpenAIApple competes_with Google
Events:Apple Passwords App Gains AI Agent for Breach Auto-Change (2026-06-08)Apple’s New Siri in Camera Adds Visual Intelligence to iPhone (2026-06-08)Apple Core AI Runs Models On-Device, Zero Server Calls (2026-06-09)
Sentiment:keyword_surge: apple UP 600.0% (7d)Sentiment toward Apple: surging
Momentum:Apple: 8 mentions (surging) [velocity: 7.0x]
Patterns:convergencecompetitive_shiftprecursor
Predict with the Lab
Resolves in
82d 01h 57m 25s
Claim: Apple will ship a second on-device AI feature in iOS 26
Lab thinks
55%
Δ Lab vs Crowd
Crowd thinks
Lab confidence55%
Crowd confidence
Eventbig techKnowledge Graph
2mo left1w ago

Apple will add a second external AI backend for Siri

Within the next quarter, Apple will quietly test or ship a second non-Apple AI backend for at least one Siri/Apple Intelligence workflow, rather than relying on a single external model partner. The tell will be a fallback or routing layer in a consumer-facing Apple AI feature, not a full public model announcement.

ConfidenceTarget: Sep 8, 2026
55%Possible
View reasoning & evidence
Reasoning: Apple is showing unusual AI momentum: its entity velocity is 7.0x, and the graph now links Apple competitively to OpenAI, Google, AMD, Nvidia, and Liquid AI. At the same time, recent news shows Apple is already pushing on-device AI and has run into EU DMA constraints on Siri AI in Europe, which makes a single-backend strategy brittle. The logic is that Apple will want optionality and regulatory insulation before WWDC-era AI features become more visible. This would be invalidated if Apple doubles down on one external provider and keeps Siri’s cloud fallback unchanged through the quarter.
How we verify: A second external AI backend is used or tested in a Siri/Apple Intelligence workflow, confirmed by product behavior, code references, or credible reporting.
Apple
Relationships:Apple competes_with OpenAIApple competes_with NvidiaApple competes_with Google
Events:Apple Core AI runs models on-device, zero server calls (2026-06-09)Apple blames EU DMA for blocking Siri AI on iOS in Europe (2026-06-08)
Sentiment:Sentiment toward on-device AI: strongly positiveSentiment toward Apple: rising
Momentum:Apple: 8 mentions (surging) [velocity: 7.0x]
Patterns:convergencecompetitive_shiftprecursor
Predict with the Lab
Resolves in
82d 01h 57m 25s
Claim: Apple will add a second external AI backend for Siri
Lab thinks
55%
Δ Lab vs Crowd
Crowd thinks
Lab confidence55%
Crowd confidence
EventproductBasic Analysis
2mo left1w ago

Google will push a TPU-linked enterprise distribution move through cloud or model tooling

Google will push a TPU-linked enterprise distribution move through cloud or model tooling. Graph evidence: Google degree=225, bridge=0.9; repeated temporal motif where Google launches are followed by Anthropic research/product responses; compute-centric narrative reinforced by TPU supply-chain logic.

ConfidenceTarget: Sep 8, 2026
72%Likely
View reasoning & evidence
Reasoning: Google’s bridge score is the highest among major companies, and the durable lesson about compute spend implies a strategic trap: convert capex into ecosystem lock-in. The graph shows Google as the main conduit between research and infrastructure, so the next move is likely a distribution or integration play, not just a model release.
How we verify: Google will push a TPU-linked enterprise distribution move through cloud or model tooling
Predict with the Lab
Resolves in
82d 01h 57m 25s
Claim: Google will push a TPU-linked enterprise distribution move through cloud or model tooling
Lab thinks
72%
Δ Lab vs Crowd
Crowd thinks
Lab confidence72%
Crowd confidence
EventproductBasic Analysis
2mo left1w ago

Claude Code becomes the default enterprise agent shell across multiple frontier-model providers

Claude Code becomes the default enterprise agent shell across multiple frontier-model providers. Graph evidence: Highest degree=176, bridge=0.6; strong co-occurrence with Anthropic, Google, and OpenAI; Claude Code ecosystem is the largest named community cluster.

ConfidenceTarget: Sep 8, 2026
78%Likely
View reasoning & evidence
Reasoning: Claude Code already has the highest PageRank and degree, and it sits at the center of the developer/tooling cluster. The strategic gap between OpenAI and Claude Code, plus the shared-neighbor overlap with GPT-3.5, suggests it is absorbing legacy and cross-vendor workflows rather than remaining an Anthropic-only product.
How we verify: Claude Code becomes the default enterprise agent shell across multiple frontier-model providers
Predict with the Lab
Resolves in
82d 01h 57m 25s
Claim: Claude Code becomes the default enterprise agent shell across multiple frontier-model providers
Lab thinks
78%
Δ Lab vs Crowd
Crowd thinks
Lab confidence78%
Crowd confidence
Eventbig techKnowledge Graph
2mo left1w ago

Google Cloud + Hugging Face native TPU deployment integration by Q3 2026

Google Cloud will announce at Google Cloud Next '26 (expected September 2026) that Hugging Face Spaces and Kernels are natively integrated into Vertex AI, enabling one-click deployment of any Hugging Face model onto Google TPU v6 pods. This will be positioned as 'the fastest path from arXiv to production' and will include a revenue-sharing agreement where Hugging Face gets 15% of compute spend generated through its platform.

ConfidenceTarget: Sep 8, 2026
82%Likely
View reasoning & evidence
Reasoning: [Agent Investigation] Hugging Face is in a paradoxical position: it is the de facto distribution layer for open models (DeepSeek, Qwen, 30B-A3B all use/license via HF), yet its graph position (pagerank=0.001, bridge_score=0.0) reveals it is NOT acting as a strategic bridge between major players. Google's 4 partnership signals with Hugging Face (detected multiple times in KG signals) suggest a deeper integration is forming, but the low bridge_score indicates this hasn't materialized into network centrality. The rising sentiment trajectory (+0.15 to +0.30 over 3 weeks) is driven by product launches (Kernels hub, Daily Papers SKILL.md), but these are horizontal plays — they don't create moats. Hugging Face is becoming a platform for everything and a strategic asset for nothing. | The data pattern suggests Hugging Face is being pulled into Google's orbit as the open-model distribution layer for Google Cloud's AI platform. The 4x repeated partnership signal with Google, combined with Google's $920M/month compute spend and TPU supply chain lock-in, implies Google will integrate Hugging Face Spaces + Kernels directly into Vertex AI within 2 quarters. This would make Hugging Face the 'open model app store' for Google Cloud, but at the cost of independence — similar to what happened to GitHub after Microsoft acquisition. The alternative trajectory (less likely given data) is that Hugging Face doubles down on its own compute layer via Kernels, competing with Nvidia's CUDA ecosystem directly. | [PRE-MORTEM] This prediction is DISPROVED if: (1) Google Cloud Next '26 passes without any Hugging Face integration announcement, (2) Google instead partners with Replicate or Fal.ai for model deployment on TPUs, (3) Hugging Face announces its own cloud service (HF Cloud) that competes with Google Cloud, or (4) Google announces a partnership with a different model hub (e.g., Civitai or Replicate) for TPU deployment.
How we verify: Official Google Cloud Next '26 keynote slide or press release announcing 'Hugging Face on Vertex AI' or 'one-click TPU deployment from Hugging Face Spaces'. Must include specific mention of TPU v6 or TPU pod support.
Hugging Face
Relationships:Hugging Face → partnered → NvidiaHugging Face → partnered → MetaHugging Face → hired → Clement DelangueHugging Face → competes_with → LM StudioHugging Face → partnered → Qwen-ScopeNvidia → partnered → Hugging Face30B-A3B Reasoning Model → licensed → Hugging FaceLM Studio → competes_with → Hugging FaceDeepSeek → uses → Hugging FacePyTorch Foundation → partnered → Hugging Face
Events:[2026-05-19] product_launch: HuggingFace launches Daily Papers SKILL.md for AI agents to read, search, and fetch research papers.[2026-04-14] product_launch: Launched 'Kernels' hub on its platform for sharing and discovering optimized GPU code.[2026-04-14] product_launch: Announced completion of a large-scale project converting 27,000 arXiv papers from PDF to Markdown using an open 5B model.[2026-04-08] executive_change: Julien Chaumond, CTO of Hugging Face, announced the Safetensors transfer via social media.[2026-03-19] product_launch: Launched Daily Papers SKILL.md, a tool for AI agents to read, search, and fetch research papers.
Sentiment:Hugging Face 2026-05-11: +0.20 (2 mentions)Hugging Face 2026-05-25: +0.30 (1 mentions)Hugging Face 2026-06-01: +0.15 (2 mentions)
Momentum:Hugging Face (company): 48 mentions
Predict with the Lab
Resolves in
82d 01h 57m 25s
Claim: Google Cloud + Hugging Face native TPU deployment integration by Q3 2026
Lab thinks
82%
Δ Lab vs Crowd
Crowd thinks
Lab confidence82%
Crowd confidence
EventresearchBasic Analysis
2mo left1w ago

dMoE + Apple Gemini = On-Device Diffusion LLM Breakthrough

Apple will announce at WWDC 2026 that its 1.2T-param Gemini model uses dMoE to run a 14-expert active subset locally on-device, achieving 80% memory reduction and enabling real-time diffusion inference on iPhone. This will trigger a wave of edge-diffusion applications.

ConfidenceTarget: Sep 7, 2026
75%Likely
View reasoning & evidence
Reasoning: [Research Analysis] Apple will announce at WWDC 2026 that its 1.2T-param Gemini model uses dMoE to run a 14-expert active subset locally on-device, achieving 80% memory reduction and enabling real-time diffusion inference on iPhone. This will trigger a wave of edge-diffusion applications.
How we verify: Apple WWDC 2026 keynote confirms dMoE-based local inference for Gemini model, with active expert count < 15.
Predict with the Lab
Resolves in
81d 01h 57m 25s
Claim: dMoE + Apple Gemini = On-Device Diffusion LLM Breakthrough
Lab thinks
75%
Δ Lab vs Crowd
Crowd thinks
Lab confidence75%
Crowd confidence

Frequently asked questions

What is an AI prediction on gentic.news?
Each prediction is a falsifiable, dated forecast about the AI industry — for example 'Claude Opus 4.7 will exceed 90% on SWE-Bench Verified before 2026-09-01' or 'OpenAI will announce a 1GW+ training campus this quarter'. Predictions cite specific entities and relationships from our knowledge graph, carry a confidence score (0–100), have a hard deadline, and get auto-verified against actual outcomes. We publish the full history — correct, incorrect, partially correct, and expired — so accuracy is auditable.
How are predictions generated?
An AI agent reads our knowledge graph (4,749+ AI entities, 4,890+ relationships) and the latest articles every few hours, looking for patterns: hiring spikes, product cadence, partnership signals, benchmark trajectories, capex announcements. When the agent finds a high-signal pattern, it drafts a falsifiable claim with a deadline, attaches the entities and articles as evidence, and assigns a confidence based on signal strength and historical accuracy on similar prediction types.
How is each prediction verified?
When the deadline arrives, a verification job re-queries our graph and a curated set of authoritative sources (official announcements, benchmark leaderboards, SEC filings, regulator notices) for evidence either way. The outcome is one of: correct, partially correct, incorrect, or expired (no confirming or refuting evidence found). Outcomes are immutable once recorded, and the calibration curve at the top of the page shows how well stated confidence matches actual hit-rate by bin.
What is the current accuracy rate?
Every resolved prediction is graded against real evidence and marked correct, partially correct, incorrect, or expired — the full, current breakdown is public on the leaderboard, so you see the real accuracy rather than a marketing number. We deliberately avoid 99%-confidence calls — these tend to be trivially true ('OpenAI will release something in 2026') and don't add information. The calibration curve shows where we're under- or over-confident.
Can I make my own prediction?
Yes. The Community tab on this page lets anyone submit a falsifiable AI prediction. Submissions need a clear claim, a deadline, and ideally a rationale. Cookie-based identity tracks your accuracy on the predictor leaderboard — no account required. Community predictions go through the same verification flow as AI-generated ones, and your hit-rate / Brier score appears on the leaderboard once you've resolved at least three.
Why publish predictions that turn out wrong?
Because hiding losses kills calibration. A forecasting system that only shows wins is uncalibrated by construction. We surface every incorrect and partially correct prediction with the original confidence, evidence, and deadline. This lets readers see whether our 75% confidence calls actually hit ~75% of the time (well-calibrated) or 60% / 90% (mis-calibrated). The calibration plot is updated nightly.

Get smarter about AI in 5 minutes

Join readers from Google, Anthropic, and NVIDIA. Every week: the 10 most important AI developments, verified predictions, and what they mean for your work. Free forever. Customize what you get →