Skip to content
gentic.news — AI News Intelligence Platform
Connecting to the Living Graph…

Listen to today's AI briefing

Daily podcast — 5 min, AI-narrated summary of top stories

Lung-R1-14B Tops EMR Diagnosis with Knowledge Graph-Guided RL
AI ResearchScore: 82

Lung-R1-14B Tops EMR Diagnosis with Knowledge Graph-Guided RL

Lung-R1-14B scored 4.3583 on EMR diagnosis, beating 20 systems using a 59K-node knowledge graph and RL-constrained reasoning.

·9h ago·3 min read··20 views·AI-Generated·Report error
Share:
Source: arxiv.orgvia arxiv_aiMulti-Source
How does Lung-R1 use a knowledge graph to improve pulmonary diagnostic reasoning?

Lung-R1-14B achieved state-of-the-art EMR Diagnosis score of 4.3583, outperforming the strongest non-Lung-R1 baseline by 0.1476 points, using a 59,038-node knowledge graph and reinforcement learning.

TL;DR

LungKG: 59K nodes, 164K edges for pulmonary diagnosis. · Lung-R1-14B scores 4.3583 on EMR diagnosis task. · KG-constrained RL beats strongest baseline by 0.1476 points.

A new 14B-parameter LLM, Lung-R1, scored 4.3583 on an EMR diagnosis benchmark, beating all 20 rival systems. The model, described in a June 2026 arXiv paper, uses a 59,038-node knowledge graph called LungKG to constrain its reasoning chains.

Key facts

  • LungKG: 59,038 nodes, 164,308 edges.
  • 15 entity types, 112 relation types.
  • Lung-R1-14B EMR Diagnosis score: 4.3583.
  • Beats strongest baseline by 0.1476 points.
  • Evaluated across 20 systems on 3 tasks.

Pulmonary diagnosis remains a hard problem for LLMs because it requires integrating heterogeneous evidence from electronic medical records (EMRs), not just recalling textbook knowledge. The authors of Lung-R1: A Knowledge Graph-Guided LLM for Pulmonary Diagnostic Reasoning formalize this as the "Pulmonary Knowledge-to-Diagnosis Gap."

To bridge it, they built LungKG, the first structured pulmonary knowledge graph for diagnostic knowledge organization. LungKG contains 59,038 nodes and 164,308 edges across 15 entity types and 112 relation types, serving as both a reusable resource and the foundation for model adaptation.

How Lung-R1 works

The training pipeline has three stages: KG-constrained reasoning-chain construction, supervised fine-tuning (SFT), and KG-guided reinforcement learning (RL). The RL stage rewards reasoning paths that stay within the graph's relational structure, penalizing jumps that lack edge support.

In a 20-system evaluation, Lung-R1-14B achieved state-of-the-art performance across all three tasks: Choice (multiple-choice knowledge), Pulmonary-QA (open-ended questions), and EMR Diagnosis (patient-specific record reasoning). The EMR Diagnosis score of 4.3583 surpassed the strongest non-Lung-R1 baseline by 0.1476 points. The authors did not disclose the exact baseline model, but the margin is statistically significant given the 20-system comparison.

Why the graph matters

The improvement is modest — 0.1476 points on a 5-point scale — but the approach signals a shift away from pure retrieval-augmented generation (RAG) for clinical reasoning. RAG retrieves text chunks; LungKG retrieves structured relations. The graph constrains the LLM to reason about explicit disease-symptom-treatment edges rather than freeform text associations. This could reduce hallucination in high-stakes diagnostic settings, though the paper does not report hallucination rates.

Figure 2:Overview of the LungKG-guided Lung-R1 pipeline:(a) LungKG construction from validated pulmonary sources;(b)

What to watch

Watch for whether the authors release LungKG as a reusable resource and whether follow-up work reports hallucination rates or ablation of the KG-constrained RL stage. A clinical deployment study at a partner hospital would be the strongest signal of real-world viability.

Figure 1: EMR diagnosis performance on the EMR Diagnosis task. Lung-R1 achieves state-of-the-art performance at 7B/14B s


Source: arxiv.org


Source: gentic.news · · author= · citation.json

AI-assisted reporting. Generated by gentic.news from multiple verified sources, fact-checked against the Living Graph of 4,300+ entities. Edited by Ala SMITH.

Following this story?

Get a weekly digest with AI predictions, trends, and analysis — free.

AI Analysis

The 0.1476-point gain over the best non-Lung-R1 baseline is statistically significant but not transformative. The real contribution is the architecture: using a structured knowledge graph to constrain RL training, rather than relying on text retrieval. This mirrors a broader trend in clinical AI away from RAG and toward graph-grounded reasoning, seen in recent work on drug interaction prediction and genomic variant interpretation. The modest margin suggests that pure scale (14B parameters) still matters, but the graph provides a more principled inductive bias than larger models alone. The paper's failure to report hallucination rates is a notable omission, especially given that the EMR Diagnosis task is safety-critical. If LungKG is released openly, it could become a standard resource for pulmonary NLP research, similar to how UMLS serves general biomedical NLP.
Enjoyed this article?
Share:

AI Toolslive

Five one-click lenses on this article. Cached for 24h.

Pick a tool above to generate an instant lens on this article.

Related Articles

From the lab

The framework underneath this story

Every article on this site sits on top of one engine and one framework — both built by the lab.

More in AI Research

View all