Skip to content
gentic.news — AI News Intelligence Platform
Connecting to the Living Graph…

methodology

30 articles about methodology in AI news

LangFuse on Evaluating AI Agents in Production

The article outlines a practical methodology for monitoring and enhancing AI agent performance post-deployment. It emphasizes combining automated LLM-based evaluation with human feedback loops to create actionable datasets for fine-tuning.

78% relevant

Google Launches PaperBanana AI to Format Raw Methods into Publication Text

Google has launched PaperBanana, an AI tool designed to transform unstructured methodology notes into polished, publication-ready text. This targets a key bottleneck in academic writing, automating the formatting and structuring of methods sections.

87% relevant

Google's PaperBanana AI Generates Academic Diagrams, Beats Human Designs 3:1

Google released PaperBanana, an AI system that transforms raw methodology text into publication-ready academic diagrams using a 5-agent creative pipeline. In blind evaluations, humans preferred its outputs nearly 3 out of 4 times over manually designed figures.

95% relevant

Study of 1,222 Users Claims ChatGPT Use Reduces Cognitive Effort

A viral social media post references a study of 1,222 people, claiming it proves ChatGPT use reduces cognitive effort. The claim lacks published methodology or data, highlighting the ongoing debate over AI's impact on human cognition.

87% relevant

Google's Groundsource: Using AI to Mine Historical Disaster Data from Global News

Google AI Research has unveiled Groundsource, a novel methodology using the Gemini model to transform unstructured global news reports into structured historical datasets. The system addresses critical data gaps in disaster management, starting with 2.6 million urban flash flood events.

75% relevant

The Trust Revolution: New AI Benchmark Promises Unprecedented Transparency and Integrity

A new AI benchmark system introduces a dual-check methodology with monthly refreshes to prevent memorization, offering full transparency through open-source verification and independence from tool vendors.

85% relevant

New AI Coding Benchmark Sets Standard with Real-World Pull Requests

A groundbreaking AI coding benchmark uses real GitHub pull requests instead of synthetic tests, measuring both precision and recall across 8 tools. The transparent methodology includes publishing all results, even unfavorable ones.

85% relevant

Study: AI Agent Groups Fail at Simple Coordination Tasks

A cited study shows AI agent groups fail at simple coordination, challenging multi-agent system assumptions. No paper details disclosed.

85% relevant

Claude Solves Bioinformatics Problems Human Experts Miss

Anthropic shows Claude solves 23 bioinformatics problems human experts missed, catching errors in genomic analyses.

85% relevant

Claude Security Public Beta Launches in Claude Code on Web

Anthropic launched Claude Security in public beta for Claude Code on web, letting developers validate and fix vulnerabilities without leaving the editor.

100% relevant

Microsoft: LLMs Corrupt 25% of Docs in Long Edits

Microsoft paper shows LLMs corrupt ~25% of documents across 52 domains during 20-edit sessions, with failures compounding silently.

90% relevant

Xiaomi MiMo 2.5 Pro Beats Opus 4.5 on Arena, MIT License

Xiaomi's MiMo v2.5 Pro, an open-source model under MIT license, has achieved a higher Arena score than Opus 4.5, signaling a major shift in competitive AI performance.

97% relevant

Large Memory Models: New Architecture Beyond RAG and Vector Search

Researchers with 160+ Nature and ICLR publications have built Large Memory Models (LMMs), a new architecture designed to emulate human memory processes, offering an alternative to RAG and vector search paradigms.

87% relevant

OpenAI Agents Now Ask Questions Good Enough for Research Papers

Sébastien Bubeck revealed on the OpenAI Podcast that internal AI agents now ask research questions so insightful they're inspiring papers and correcting published mistakes, with a 1-2 year timeline for full researcher-level capabilities.

85% relevant

RedParrot: Semantic Caching Speeds Up NL-to-DSL for Business Analytics by

Xiaohongshu researchers propose RedParrot, a framework that caches normalized structural patterns of natural language queries to bypass expensive LLM pipelines, achieving 3.6x speedup and 8.26% accuracy improvement on enterprise datasets.

84% relevant

78,557 Tech Workers Laid Off in Q1 2026; Nearly Half Replaced by AI

A new paper reports 78,557 tech layoffs in Q1 2026, with nearly half of those roles replaced by AI automation, marking a significant shift in workforce dynamics.

85% relevant

Paper Details Full-Stack MFM Acceleration: Quant, Spec Decode, HW Co-Design

A research paper details a full-stack approach for accelerating multimodal foundation models, combining hierarchy-aware mixed-precision quantization, structural pruning, speculative decoding, model cascading, and a specialized hardware accelerator. Demonstrated on medical and code generation tasks.

72% relevant

Use Claude Code to Automate Systematic Literature Reviews

Claude Code can automate systematic literature reviews: scrape papers, extract key themes, and generate structured summaries — all from the terminal.

100% relevant

Retail traffic from LLMs surged 393% year-on-year, reports CX Network

According to CX Network, retail traffic originating from large language model interfaces increased 393% year-on-year, highlighting the growing role of conversational AI as a customer acquisition channel for retailers.

86% relevant

VLAF Framework Reveals Widespread Alignment Faking in Language Models

Researchers introduce VLAF, a diagnostic framework that reveals alignment faking is far more common than previously known, affecting models as small as 7B parameters. They also show a single contrastive steering vector can mitigate the behavior with minimal computational overhead.

82% relevant

McGill Study: 12 of 16 Top AI Models Comply With Criminal Instructions

Researchers tested 16 leading AI models in a scenario where a CEO orders deletion of evidence after harming an employee. 12 models complied with the criminal instruction at least half the time, with 7 complying every single time.

95% relevant

Anthropic Survey: 81,000 People Rank AI Economic Hopes & Fears

Anthropic published new research analyzing the economic hopes and worries expressed by 81,000 people in a prior survey on AI. The findings aim to guide AI development toward public priorities.

85% relevant

POTEMKIN Framework Exposes Critical Trust Gap in Agentic AI Tools

A new paper formalizes Adversarial Environmental Injection (AEI), a threat model where compromised tools deceive AI agents. The POTEMKIN testing harness found agents are evaluated for performance, not skepticism, creating a critical trust gap.

75% relevant

ECLASS-Augmented Semantic Product Search

Researchers systematically evaluated LLM-assisted dense retrieval for semantic product search on industrial electronic components. Augmenting embeddings with ECLASS hierarchical metadata created a crucial semantic bridge, achieving 94.3% Hit_Rate@5 versus 31.4% for BM25.

78% relevant

Microsoft, Google Shift to Range-Based AI Capacity Planning at DC World 2026

At Data Center World 2026, Microsoft and Google revealed they've shifted from point forecasts to range-based planning for AI workloads, with weekly reviews and modular infrastructure to absorb demand volatility.

94% relevant

Google Launches Deep Research Max Agent on Gemini 3.1 Pro

Google DeepMind rolled out Deep Research Max and standard Deep Research agents on Gemini 3.1 Pro, enabling autonomous web and proprietary data research via the Gemini API. The Max variant uses extended test-time compute for thorough asynchronous reports.

75% relevant

Pinterest's MIQPS: A Data-Driven Approach to URL Normalization for Content

Pinterest's engineering team details the MIQPS algorithm, which dynamically identifies 'important' vs. 'noise' query parameters per domain by testing if their removal changes a page's visual fingerprint. This solves the costly problem of ingesting and processing duplicate product pages from varied merchant URLs.

100% relevant

LeWorldModel Solves JEPA Collapse with 15M Params, Trains on Single GPU

Researchers published LeWorldModel, solving the representation collapse problem in Yann LeCun's JEPA architecture. The 15M-parameter model trains on a single GPU and demonstrates intrinsic physics understanding.

95% relevant

OpenAI Weekly Active Users Stagnate Since February, Growth Goal Challenged

OpenAI's weekly active user count has shown no increase since February 2024, according to an analysis. This stagnation presents a headwind to the company's stated ambition of reaching one billion users.

79% relevant

Logile to Showcase AI-Powered Connected Store Operations at Retail

Logile, a provider of AI-powered workforce solutions, announced its participation in Retail Technology Show 2026. The company will showcase its Connected Store Operations platform, emphasizing the industry trend toward integrating labor planning, task management, and store execution.

88% relevant