How does ReMMD-Agent reduce cost so dramatically?

It uses a persistent memory bank to reuse evidence across atomic claims, avoiding repeated API calls for redundant evidence retrieval.

What languages does ReMMDBench cover?

Five monolingual languages plus two cross-lingual settings, though the paper does not list them explicitly in the abstract.

Listen to today's AI briefing

Daily podcast — 5 min, AI-narrated summary of top stories

Listen

AI ResearchScore: 72

ReMMD Agent Hits 41.8% Accuracy on Multilingual Misinformation, Cuts Cost 79.9%

ReMMD-Agent achieves 41.8% accuracy on multilingual misinformation detection with 79.9% cost reduction, using a persistent memory approach.

AAAla SMITH & AI Research Desk·2d ago·3 min read··23 views·AI-Generated·Report error

Source: arxiv.orgvia arxiv_aiCorroborated

What is ReMMD and how does it detect multimodal misinformation?

ReMMD-Agent achieved 41.80% accuracy and 39.12% macro-F1 on the ReMMDBench multilingual misinformation dataset using GPT-5.2, reducing verification cost by 79.9% relative to T2-Agent.

TL;DR

ReMMDBench: 500 samples, 2,756 images, 5 languages. · ReMMD-Agent achieves 41.8% accuracy with GPT-5.2. · Cost reduction of 79.9% vs T2-Agent.

ReMMD-Agent scored 41.80% accuracy on the new ReMMDBench, a 500-sample multilingual misinformation benchmark with 2,756 images across five languages. The agentic system, using GPT-5.2, cut verification cost by 79.9% relative to the prior T2-Agent baseline.

Key facts

ReMMDBench: 500 samples, 2,756 images, 5 languages.
ReMMD-Agent accuracy: 41.80% using GPT-5.2.
Cost reduction: 79.9% vs T2-Agent.
Five-way veracity labels including 'manipulated'.
Eight distortion labels for fine-grained analysis.

Most multimodal misinformation benchmarks test single-image, short-caption, binary-label scenarios that don't reflect how misinformation actually spreads online. According to the ReMMD paper, viral posts now combine long multilingual narratives, several images, mixed provenance, and subtle text–image framing errors. Existing methods remain poorly matched to this setting.

ReMMD addresses this gap with two components. ReMMDBench includes 500 real-world samples, 2,756 images, five monolingual languages (with two cross-lingual settings), three text-length tiers, multi-image posts, five-way veracity labels, eight distortion labels, evidence provenance, and rationales. The benchmark is designed to be refreshed yearly to reduce contamination.

ReMMD-Agent is a persistent-memory verifier that decomposes posts into atomic claims, builds a reusable evidence set, and predicts structured L1/L2/L3 outputs. Across proprietary systems, open LVLMs, MMD-Agent, and T2-Agent, ReMMD-Agent obtains the best five-way veracity performance, with 41.80% accuracy and 39.12% macro-F1 using GPT-5.2, while reducing cost by 17.5% relative to MMD-Agent and 79.9% relative to T2-Agent.
41.8% accuracy sounds low, but the benchmark's five-way veracity labels (true, false, misleading, unverifiable, manipulated) and cross-lingual complexity make it significantly harder than binary classification tasks. Prior benchmarks like Fakeddit or Twitter-2016 operate at a fraction of this difficulty. The cost reduction is the real signal — agentic verification under realistic evidence search has been prohibitively expensive, and ReMMD-Agent's persistent memory approach directly attacks that bottleneck.

ReMMD-Agent uses GPT-5.2 (developed by OpenAI) as its backbone. The project is open-source, hosted on GitHub, and builds on prior agentic verification frameworks like MMD-Agent and T2-Agent. The paper includes ablation studies on memory persistence, evidence reuse, and structured output formats.

Limitations and Open Questions

The benchmark's 500 samples, while carefully curated, may not capture the full distribution of multilingual misinformation. The authors note that yearly refresh is planned to mitigate contamination. The cost comparison assumes fixed API pricing — enterprise deployments with volume discounts may see different savings. The system's performance on low-resource languages beyond the five tested remains unknown.

What to watch

Watch for the yearly refresh of ReMMDBench in mid-2027, which will reveal whether contamination affects the 41.8% baseline. Also track adoption of persistent-memory verification in other agentic benchmarks like OSWorld or WebArena.

Figure 3: ReMMD-Agent verifies a multimodal post by first decomposing text and images into atomic claims, observations,

Source: arxiv.org

Source: gentic.news · 2d ago · author=Ala SMITH · citation.json

AI-assisted reporting. Generated by gentic.news from multiple verified sources, fact-checked against the Living Graph of 4,300+ entities. Edited by Ala SMITH.

Following this story?

Get a weekly digest with AI predictions, trends, and analysis — free.

AI Analysis

The ReMMD paper addresses a genuine gap: most misinformation benchmarks are too simple. The 41.8% accuracy on five-way classification with cross-lingual complexity is a solid baseline, not a failure. The cost reduction is the more important contribution — agentic verification has been stuck in a cost-vs-coverage tradeoff, and ReMMD-Agent's persistent memory bank is a pragmatic solution. Compared to prior work like MMD-Agent (which also uses agentic decomposition but without persistent memory), ReMMD-Agent's 17.5% cost reduction is modest but the 79.9% reduction vs T2-Agent is striking. T2-Agent likely uses a more expensive retrieval strategy, possibly involving multiple API calls per claim without caching. The paper doesn't disclose the exact cost per sample, which would be useful for reproducibility. The benchmark design is thoughtful — yearly refresh to prevent contamination, multi-image posts, cross-lingual settings. But 500 samples is small for a benchmark that aims to be "real-world." The authors should release a larger v2. The eight distortion labels (e.g., out-of-context, manipulated, AI-generated) are a nice granularity that allows for more nuanced evaluation than binary true/false. One contrarian note: the paper claims "real-world" but the samples are constructed from real misinformation topics. This is a reasonable compromise for controlled evaluation, but it means the benchmark may not capture the full adversarial distribution of in-the-wild misinformation. The yearly refresh helps, but the initial dataset's construction methodology needs scrutiny.

#agentic systems #misinformation #ai research #multimodal ai

Compare side-by-side

ReMMD-Agent vs GPT-5.3

→

Mentioned in this article

ReMMD-Agent ReMMDBench GPT-5.3 T2-Agent

Enjoyed this article?

Get the weekly AI intelligence briefing

✨AI Toolslive

Five one-click lenses on this article. Cached for 24h.

Pick a tool above to generate an instant lens on this article.

AI Research

Anthropic Study: Senior Engineers Beat Juniors With AI by 31%

From the lab

The framework underneath this story

Every article on this site sits on top of one engine and one framework — both built by the lab.

Original research · EUMAS 2026

MNEMA — A Witness Lattice for Multi-Agent AI Memory

Cryptographic memory units · 1−α detection floor · 15 pp PDF

Field framework · v1.0

Epistemic Infrastructure

12 pillars · 11-stage knowledge metabolism · pathology catalog

ReMMD Agent Hits 41.8% Accuracy on Multilingual Misinformation, Cuts Cost 79.9%

Limitations and Open Questions

What to watch

AI Analysis

✨AI Toolslive

Related Articles

Tencent Open-Sources Agent Memory System Cutting Token Use 61%

OpenAI GPT-5.5-Cyber Beats Anthropic Mythos on Security Benchmarks

ByteDance Seed's SpatialTree Redefines MLLM Spatial Reasoning at CVPR 2026

How to Govern Claude Code Across Your Team: 4 Gaps to Fix Before the Next CVE

OpenAI Can Predict Model Failures via Past Chat Replay