Connecting to the Living Graph…

AI Research

Breaking AI research news: latest papers from arXiv, NeurIPS, ICML, and top labs. Track transformer architecture advances, reasoning breakthroughs, and scientific discoveries in machine learning and artificial intelligence.

AI Research Funding & Business Products & Launches Big Tech Startups Open Source Policy & Ethics Opinion & Analysis

NVIDIA GB300 NVL72 server rack with blue LED lights, labeled as MLPerf 6.0 benchmark winner, surrounded by data…

MLPerf 6.0: NVIDIA Sweeps New Benchmarks, AMD MI355X Within 30% on Select Tests

MLPerf 6.0 results show NVIDIA winning every new benchmark, with its GB300 NVL72 system achieving nearly 3x more throughput than six months ago. AMD's MI355X showed progress, coming within 10-30% on select single-node tests but skipping most new benchmarks.

x.com/Apr 7, 2026/3 min read

ai infrastructurehardwarebenchmarks

A person sits at a desk, typing on a laptop with the ChatGPT interface open on the screen, surrounded by books and…

Study of 1,222 Users Claims ChatGPT Use Reduces Cognitive Effort

A viral social media post references a study of 1,222 people, claiming it proves ChatGPT use reduces cognitive effort. The claim lacks published methodology or data, highlighting the ongoing debate over AI's impact on human cognition.

x.com/Apr 7, 2026/3 min read

human-computer interactionai ethicsmisinformation

A large digital storage drive labeled 1.51TB connected to a server rack, with a glowing AI model icon on a monitor…

754B-Parameter AI Model Hits Hugging Face, Weighs 1.51TB

An unidentified 754-billion-parameter AI model has been uploaded to the Hugging Face platform, consuming 1.51TB of space. This represents one of the largest publicly accessible model repositories by size.

x.com/Apr 7, 2026/3 min read

hugging facemodel releasellm

A sleek computer monitor displays a complex neural network diagram with glowing blue and orange nodes, while a…

GLM-5.1 Claims Autonomous Self-Improvement Without Human Metrics

Zhipu AI's GLM-5.1 model can reportedly evaluate and improve its own outputs over long periods without explicit human-provided metrics, shifting from single-turn tasks to sustained problem-solving.

x.com/Apr 7, 2026/3 min read

zhipu aillmsresearch

AI Overviews search result with Wikipedia text, showing factual error and obscured source attribution

AI Overviews' Accuracy Mirrors Wikipedia, Complicating Performance Metrics

A case study highlights that AI Overviews' factual errors often originate from Wikipedia, but the AI's presentation obscures sources. This complicates standard accuracy benchmarks for LLMs.

x.com/Apr 7, 2026/3 min read

evaluationgooglelarge language models

A digital network diagram overlays biological, mechanical, and musical resonator symbols; a glowing node highlights…

AI Agents Map Resonators Across Domains, Design Bio-Inspired Structure

AI agents have mapped resonators from biology, engineering, and music into a shared latent space, discovered an unexplored design region, and autonomously generated and validated a novel bio-inspired resonator structure.

x.com/Apr 7, 2026/3 min read

ai for sciencedesigngenerative ai

AI-generated illustration of a DNA double helix with glowing CRISPR scissors targeting a specific gene sequence…

DeepMind's AlphaGenome AI Decodes Non-Coding DNA for CRISPR Targeting

Demis Hassabis states that while CRISPR can edit DNA, finding the right target is hard. DeepMind's AlphaGenome AI is analyzing the non-coding genome to predict mutation effects and guide precise CRISPR interventions.

x.com/Apr 7, 2026/3 min read

therapeuticsdeep learninggenomics

A CPU chip on a circuit board with glowing blue lines, representing efficient AI processing without GPUs

Microsoft's BitNet Enables 100B-Parameter LLMs on CPU, Cuts Energy 82%

Microsoft Research's BitNet project demonstrates 1-bit LLMs with 100B parameters that run efficiently on CPUs, using 82% less energy while maintaining performance, challenging the need for GPUs in local deployment.

x.com/Apr 7, 2026/3 min read

hardwareresearchmodel efficiency

A mysterious AI model ranking chart shows Seedance 2.0 surpassed by an unnamed competitor on the Artificial Analysis…

Unidentified AI Model Tops Seedance 2.0 on Artificial Analysis

An unidentified AI model has outperformed the well-regarded Seedance 2.0 on the Artificial Analysis benchmark. The developer remains unknown, sparking speculation about a new entrant in the crowded model landscape.

x.com/Apr 7, 2026/3 min read

model performanceresearchbenchmarks

A person sits at a desk using a laptop with ChatGPT interface visible, surrounded by study notes and a clock, while…

Study: 10 Minutes with ChatGPT Cuts Problem-Solving Rate from 73% to 57%

Researchers from Carnegie Mellon, Oxford, MIT, and UCLA found that just 10 minutes of ChatGPT use reduced participants' independent problem-solving success from 73% to 57%. The effect was strongest in users who sought direct answers, whose performance fell below their original baseline.

x.com/Apr 7, 2026/3 min read

researchethicscognitive science

A laptop screen displays a Colab notebook with Python code for training a small transformer model, next to a coffee…

Tiny 9M Parameter LLM Tutorial Runs on Colab, Demystifies Transformer Training

A developer shared a complete tutorial for training a ~9M parameter transformer language model from scratch, including tokenizer, training, and inference, all runnable on Google Colab in minutes.

x.com/Apr 7, 2026/3 min read

open-sourcetutorialeducation

Researchers from Stanford and MIT examine a complex diagram of AI model layers, tools, and prompts, representing the…

Stanford/MIT Paper: AI Performance Depends on 'Model Harnesses'

A new paper from Stanford and MIT introduces the concept of 'Model Harnesses,' arguing that the wrapper of prompts, tools, and infrastructure around a base model is a primary determinant of real-world AI performance.

x.com/Apr 7, 2026/3 min read

ai-engineeringpromptingresearch

A satellite image of a dense urban area with roads, buildings, and green spaces, likely illustrating a benchmark…

AlphaEarth Embeddings Outperform Prithvi, Clay in Urban Signal Benchmark

Researchers benchmarked three geospatial foundation models—AlphaEarth, Prithvi, and Clay—on predicting 14 neighborhood-level urban indicators from satellite imagery. AlphaEarth's compact 64-dimensional embeddings proved most informative, achieving the highest predictive skill for built-environment-linked outcomes like chronic health burdens.

arxiv.org/Apr 7, 2026/3 min read

researchcomputer-visionapplications

A software developer at a workstation with multiple monitors displaying code and AI pipeline diagrams, illustrating…

Gemma4 + Falcon Perception Enables Vision-Action Agent Pipeline

A developer shared a pipeline where Gemma4 interprets images, Falcon Perception segments objects with metadata, and Gemma4 reasons to call tools. This demonstrates a modular approach to vision-language-action agents.

x.com/Apr 6, 2026/3 min read

open-sourcellmsagents

Researchers at Carnegie Mellon University present a study showing leading LLMs failing simple contradiction tests…

CMU Study: Top LLMs Fail Simple Contradiction Tests, Lack True Reasoning

Carnegie Mellon researchers tested 14 leading LLMs on simple contradiction tasks; all failed consistently, revealing fundamental reasoning gaps despite advanced benchmarks. (199 chars)

x.com/Apr 6, 2026/3 min read

ai safetyresearchevaluation

A glowing microchip on a dark circuit board, surrounded by faint light pulses, symbolizing energy-efficient AI…

AI System Claims 100x Energy Efficiency Gain with Higher Accuracy

A new AI system reportedly uses 100 times less energy than current models while achieving higher accuracy. If validated, this could significantly reduce the operational costs and environmental impact of large-scale AI deployment.

x.com/Apr 6, 2026/3 min read

efficiencyhardwareresearch

Efficient Universal Perception Encoder (EUPE) Family Chal…

Efficient Universal Perception Encoder (EUPE) Family Challenges DINOv2

Researchers introduced the Efficient Universal Perception Encoder (EUPE), a family of compact vision models that achieve performance rivaling the larger DINOv2. This could enable high-quality visual understanding on resource-constrained devices.

x.com/Apr 6, 2026/3 min read

foundation modelscomputer visionresearch

A diagram showing RLSD architecture with two model branches merging self-distillation and verifiable reward signals…

RLSD Unifies Self-Distillation & Verifiable Rewards to Fix RL Leakage

Researchers propose RLSD, a method merging on-policy self-distillation with verifiable rewards to fix information leakage and training instability in language model reinforcement learning.

x.com/Apr 6, 2026/3 min read

large-language-modelsresearchfine-tuning

A McKinsey report graphic shows a bar chart comparing AI infrastructure value creation growth to slower business…

McKinsey: AI Infrastructure Value Creation Outpaces Business Capture

McKinsey's latest analysis indicates the pace of value creation from AI infrastructure is exceeding the rate at which most businesses are capturing it, highlighting a growing implementation deficit.

x.com/Apr 6, 2026/3 min read

enterprisestrategyanalysis

A hand holds a printed cheatsheet showing transformer architecture diagrams with LoRA, RAG, and MoE annotations, set…

Stanford Releases Free LLM & Transformer Cheatsheets Covering LoRA, RAG, MoE

Stanford University has released a free, open-source collection of cheatsheets covering core LLM concepts from self-attention to RAG and LoRA. This provides a consolidated technical reference for engineers and researchers.

x.com/Apr 6, 2026/3 min read

llmsresearcheducation

Researchers from Stanford and MIT present Meta-Harness, a framework that automatically optimizes system code to…

Meta-Harness from Stanford/MIT Shows System Code Creates 6x AI Performance Gap

Stanford and MIT researchers show AI performance depends as much on the surrounding system code (the 'harness') as the model itself. Their Meta-Harness framework automatically improves this code, yielding significant gains in reasoning and classification tasks.

x.com/Apr 6, 2026/3 min read

deploymentagentsresearch

A laptop displaying a web browser with a coding interface, surrounded by abstract digital security icons and network…

Google DeepMind: Web Environment, Not Model Weights, Is Key AI Agent Attack Surface

Google DeepMind researchers present a systematic framework showing that the web environment itself—not just the model—is a primary attack surface for AI agents. In benchmarks, hidden prompt injections hijacked agents in up to 86% of scenarios, with memory poisoning attacks exceeding 80% success.

x.com/Apr 6, 2026/3 min read

agentssecurityresearch

Researchers present a diagram of token warping for multi-view reasoning in MLLMs, outperforming pixel methods in…

Token Warping for MLLMs Outperforms Pixel Methods in View Synthesis

Researchers propose warping image tokens instead of pixels for multi-view reasoning in MLLMs. The zero-shot method is robust to depth noise and outperforms established baselines.

x.com/Apr 6, 2026/3 min read

3d-visionmultimodal-aitransformer

A developer's computer screen displays a list of 10 advanced prompting techniques for Claude, with code snippets and…

Anthropic's 'Claude Secret Codes' Revealed: 10 Advanced Prompting Techniques

A developer has compiled 10 advanced prompting techniques, dubbed 'Claude secret codes,' reportedly used by Anthropic engineers and power users. The list aims to bridge the gap between basic and expert-level AI interaction.

x.com/Apr 6, 2026/3 min read

anthropicllmsai skills

A large language model interface displays molecular structures and drug-protein interaction data, with researchers…

DrugPlayGround Benchmark Tests LLMs on Drug Discovery Tasks

A new framework called DrugPlayGround provides the first standardized benchmark for evaluating large language models on key drug discovery tasks, including predicting drug-protein interactions and chemical properties. This addresses a critical gap in objectively assessing LLMs' potential to accelerate pharmaceutical research.

arxiv.org/Apr 6, 2026/3 min read/Widely Reported

researchmachine learningbenchmark

Bar chart comparing LLM benchmark scores, with top models reaching about 66% success rate, highlighting the expert…

XpertBench Benchmark Reveals LLM 'Expert Gap', Top Models Score ~66%

Researchers introduced XpertBench, a benchmark of 1,346 tasks curated by domain experts. Leading LLMs achieve a peak success rate of only ~66%, revealing a pronounced 'expert-gap' in complex professional reasoning.

arxiv.org/Apr 6, 2026/3 min read

large-language-modelsresearchbenchmarks

A diagram showing an image with embedded text labels next to a graph of cost savings, illustrating the Image Prompt…

Image Prompt Packaging Cuts Multimodal Inference Costs Up to 91%

A new method called Image Prompt Packaging (IPPg) embeds structured text directly into images, reducing token-based inference costs by 35.8–91% across GPT-4.1, GPT-4o, and Claude 3.5 Sonnet. Performance outcomes are highly model-dependent, with GPT-4.1 showing simultaneous accuracy and cost gains on some tasks.

arxiv.org/Apr 6, 2026/3 min read/Multi-Source

large-language-modelsapiresearch

A line graph showing a steep upward trend labeled 'reasoning performance' on the y-axis versus 'tokens' on the…

Scaling Law Plateau Not Universal: More Tokens Boost Reasoning AI Performance

Empirical evidence indicates the 'second scaling law'—performance gains from increased computation—does not fully plateau for many reasoning tasks. Benchmark results may be artificially limited by token budgets, not model capability.

x.com/Apr 5, 2026/3 min read

reasoningresearchscaling

ASI-Evolve: This AI Designs Better AI Than Humans Can — 105 New Architectures, Zero Human Guidance

ASI-Evolve: This AI Designs Better AI Than Humans Can — 105 New Architectures, Zero Human Guidance

Researchers built an AI that runs the entire research cycle on its own — reading papers, designing experiments, running them, and learning from results. It discovered 105 architectures that beat human-designed models, and invented new learning algorithms. Open-sourced.

arxiv.org/Apr 5, 2026/3 min read

Bar chart titled 'Adversarial Cost to Exploit (ACE)' comparing AI models, with Claude Haiku 4.5's bar at $10.21…

Claude Haiku 4.5 Costs $10.21 to Breach, 10x Harder Than Rivals in ACE Benchmark

Fabraix's ACE benchmark measures the dollar cost to break AI agents. Claude Haiku 4.5 required a mean adversarial cost of $10.21, making it 10x more resistant than the next best model, GPT-5.4 Nano ($1.15).

fabraix.com/Apr 5, 2026/3 min read

anthropicai securityai agents