Skip to content
gentic.news — AI News Intelligence Platform
Connecting to the Living Graph…
26 answers · all sourcedUpdated 2026-05-18

AI answers,
direct, sourced, current.

The questions people actually ask about AI in 2026 — current SOTA scores, compute deals, frameworks, papers, and the people behind them. Each answer is short enough to lift, every claim links to source.

Current SOTA

4 answers

Where the frontier sits in May 2026, by benchmark.

What is the current SOTA on OSWorld-Verified?

Holo3-35B-A3B holds OSWorld-Verified at 80.4% — the first model past the 72.4% human baseline. GPT-5.5 trails at 78.7% and Claude Opus 4.7 at 78.0%. OSWorld-Verified measures real desktop computer-use across multi-step GUI tasks.

What is the current SOTA on BrowseComp?

Claude Mythos Preview leads BrowseComp at 86.9%. BrowseComp measures multi-hop web-research and synthesis. Mythos remains release-limited; Anthropic ships less powerful Opus 4.7 publicly while testing safeguards on Mythos.

AI infrastructure

5 answers

Compute deals, gigawatt-scale data centers, and the bottlenecks behind them.

What is the Anthropic-SpaceX compute deal?

Announced May 6, 2026: Anthropic leases all of Colossus 1 — a Memphis, Tennessee data center with ~220,000 NVIDIA GPUs (H100/H200/GB200) and 300 MW capacity — that SpaceX absorbed when it took over xAI. xAI training migrated to Colossus 2; Anthropic now uses the freed-up site.

What is OpenAI's Stargate project?

Stargate is OpenAI + Oracle + SoftBank's 10 GW / $500B AI infrastructure plan. Five new US sites announced; flagship Abilene, Texas targeting 1.2 GW operational by mid-2026 with 450,000+ GPUs. Combined commitment ~$400B over three years.

What is xAI's Colossus 2?

Memphis facility targeting 1.6 GW power draw and 550,000 to 1 million NVIDIA GPUs by end-2026. Built in roughly 12 months — among the fastest gigawatt-scale data center builds ever attempted. Colossus 1 (the original site) is now leased entirely to Anthropic.

What is the typical capex per gigawatt of AI data center?

About $29 billion per gigawatt of total facility power (2026 baseline). Microsoft Fairwater in Wisconsin is projected to exceed $100 billion total capex. Stranded capital + grid-interconnect wait times — not chips or money — are now the binding constraint on buildouts.

Frameworks & papers

5 answers

Original research and field frameworks from Gentic Lab and the broader community.

What is MNEMA?

MNEMA is a witness-lattice architecture for multi-agent AI memory. Each memory unit becomes an autonomous cryptographic witness with a hash-chained signed journal and the structural right to refuse. Submitted to EUMAS 2026 (under single-blind review). Closed-form bound: P_undetected = α + (1−α)·β^(1+q).

What is Epistemic Infrastructure?

A framework for governing organisational knowledge as a living system. 12 pillars (truth dimensions, temporal governance, claim-level units, etc.), an 11-stage knowledge metabolism, and 13 named pathologies (zombie knowledge, memory scar tissue, knowledge nepotism, etc.). The discipline AI memory needs to grow into.

What is PageIndex?

Vectorless, reasoning-based RAG (VectifyAI, Sept 2025). Builds a hierarchical tree index from long documents and lets an LLM reason over the index instead of computing vector similarity. Reaches 98.7% on FinanceBench vs ~50% for traditional vector RAG. Integrates via MCP server.

What is A-MEM?

Zettelkasten-style agentic memory for LLM agents (NeurIPS 2025). Each new memory generates a structured note with attributes; existing notes update when new memories integrate. Closest existing relative to MNEMA's witness concept — but without decision rights, signed journals, or formal audit framework.

What is the model collapse problem?

Generative models trained on content produced by earlier models progressively lose information from the tails of the original distribution (Shumailov et al., Nature 2024). By April 2025, 74% of new webpages contained AI-generated text — the contamination is structural. Mitigated by accumulation rather than replacement, plus watermarking + provenance.

Concepts & terms

5 answers

Plain-English answers to terms that show up in every AI conversation.

What is a knowledge half-life?

The time after which a claim's confidence has decayed to half its initial value — domain-specific. Pricing and org charts decay in days; legal claims in years; foundational technical knowledge in years to decades. The single most common defect in production RAG is treating all documents at the same temporal weight.

What is zombie knowledge?

Deprecated knowledge that retrieves cleanly because copies still exist in dashboards, prompts, embeddings, onboarding decks, code, and Slack. High momentum (loud propagation) plus low validity (no longer true) is the textbook definition.

What is memory scar tissue?

An emergency workaround from a past incident that becomes the canonical retrieved answer long after the emergency ended. Common in systems with no expiry policy: temporary code paths and crisis docs survive years past their relevance window.

What is RAG (retrieval-augmented generation)?

An architecture where an LLM retrieves relevant context from a corpus before generating an answer. The traditional pattern uses vector similarity over chunked documents. Newer approaches — PageIndex (vectorless reasoning), claim-graph RAG (MNEMA) — argue similarity is not relevance.

What is MCP (Model Context Protocol)?

Anthropic's open protocol for giving LLM agents tool access through standardised servers. Adopted across Claude Code, Cursor, Continue, and most agent runtimes. Recent security research found 43% of public MCP servers have exploitable issues — security audit before deployment is now standard.

Comparisons

3 answers

Head-to-head, evidence-backed.

Claude Code vs Cursor — which is better?

Different shapes. Cursor is an IDE with agent integration; Claude Code is a CLI-first agent. By SWE-Bench Pro, Claude Opus 4.7 leads (64.3% vs Composer 2 in Cursor). Cursor wins on real-time IDE collaboration; Claude Code wins on autonomous long-running tasks and Unix-shell native flow.

Anthropic vs OpenAI — who is winning?

Different markets. OpenAI dominates consumer (ChatGPT) and developer mindshare; Anthropic leads on enterprise + agentic-coding benchmarks. Anthropic now has 5 stacked compute commitments (AWS, Google/Broadcom, Microsoft/NVIDIA, Fluidstack, SpaceX/Colossus 1) at a reported $900B valuation. OpenAI's Stargate targets 10 GW by 2027.

Vector RAG vs PageIndex — which to use?

Use vector RAG for broad, fast search across many documents. Use PageIndex when accuracy and document structure matter (legal, financial, technical), at the cost of higher latency and more LLM calls per query. Most production systems benefit from a hybrid router that picks per-query.

Tools & guides

4 answers

Direct links to the buyers' guides and live tools.

What are the best AI coding assistants in 2026?

Top contenders ranked by real-workload performance: Claude Code (best agentic), Cursor (best IDE-integrated), Codex / GPT-5.4 (strong general-purpose), Devin (autonomous), GitHub Copilot Workspace (enterprise), OpenHands (open-source). See the live ranking with current benchmarks.

What are the best LLMs in 2026?

Frontier closed: Claude Opus 4.7, GPT-5.5, Gemini 3.1 Pro, Claude Mythos Preview (limited release). Frontier open: Llama 4, DeepSeek V4, Qwen 3.5-Omni, MiniMax-M2.7. Best practical pick depends on coding vs agentic-computer-use vs reasoning workload.

Where can I see the live AI knowledge graph?

gentic.news/graph is an interactive visualisation of the autonomous knowledge graph: 5,100+ entities, 2,500+ relationships across companies, models, papers, benchmarks, people, technologies. Updated every two hours by the Brain.

What is gentic.news The Brain?

An autonomous reasoning engine that runs every 90 minutes, 24/7. It scans, hypothesises, investigates, verifies, writes findings, and reflects — every claim graph-grounded, every prediction falsifiable. RSS feeds at /api/v1/feeds/rss/cycles and /api/v1/feeds/rss/findings.

Asked something we haven’t answered?

The Brain runs every 90 minutes — every cycle adds new findings, new entities, new answers. Browse the full findings library, ask directly via Ask the Brain, or explore the live knowledge graph.

All answers on this page are graph-grounded and cite the source article. License: CC-BY-4.0. Questions evolve as the field does — bookmark and check back.