March 2026 — A new paper on arXiv introduces DISCO-TAB (DIScriminator-guided COntrol for TABular synthesis), a hierarchical reinforcement learning (RL) framework designed to solve a critical bottleneck in medical AI: generating high-fidelity, privacy-preserving synthetic Electronic Health Records (EHRs). The method orchestrates a fine-tuned large language model (LLM) with a multi-objective discriminator, evaluating synthetic data at four distinct granularities. Benchmarks on datasets like Heart Failure and Parkinson's show the framework delivers up to a 38.2% improvement in downstream clinical classifier utility compared to Generative Adversarial Network (GAN) and diffusion model baselines, while maintaining exceptional statistical fidelity with a Jensen-Shannon Divergence (JSD) score below 0.01.
The Core Problem: Statistically Plausible but Clinically Invalid Data
Generative models, including LLMs, have been applied to synthetic tabular data generation to circumvent privacy restrictions and data scarcity in healthcare. However, they often fail to capture the complex, non-linear dependencies and severe class imbalances inherent in real EHRs. This results in records that look statistically reasonable on aggregate metrics but contain clinically impossible combinations (e.g., a newborn with a diagnosis of coronary artery disease). Prior methods typically provide a single, scalar reward signal during training, which is insufficient to enforce the intricate medical logic required for valid synthetic patients.
What the Researchers Built: A Four-Layer Feedback System
DISCO-TAB addresses this by replacing scalar feedback with a hierarchical discriminator system that provides RL-guided rewards at four levels:
- Token-level: Ensures basic syntactic and semantic validity of individual data points.
- Sentence-level: Evaluates the coherence of generated feature-value pairs within a record.
- Feature-level: Assesses the plausibility of relationships between specific clinical variables (e.g., blood pressure and heart rate).
- Row-level: Judges the overall validity and realism of a complete synthetic patient record.

This multi-granular feedback is used to train a policy LLM that has been initially fine-tuned on the target tabular dataset. The framework integrates two key technical innovations:
- Automated Constraint Discovery: The system autonomously identifies latent medical constraints (e.g., "age < 18 excludes diagnosis of chronic obstructive pulmonary disease") from the training data.
- Inverse-Frequency Reward Shaping: It dynamically adjusts rewards to prevent minority-class collapse, a common failure mode where rare but critical conditions (e.g., a specific cancer subtype) are underrepresented or absent in the synthetic data.
Key Results: State-of-the-Art Performance on Medical Benchmarks
The team validated DISCO-TAB on several high-dimensional, small-sample medical datasets, including Heart Failure and Parkinson's disease records. Performance was measured across three axes: utility (how well the synthetic data trains a downstream classifier), statistical fidelity (how closely the synthetic data distribution matches the real data), and privacy (resistance to membership inference attacks).
Downstream Classifier F1-Score 0.892 0.645 0.701 Improvement in Utility +38.2% Baseline +8.7% Statistical Fidelity (JSD ↓) < 0.01 ~0.05 ~0.03 Privacy (MIA Resistance) Robust Moderate ModerateTable: DISCO-TAB outperforms established baselines on key metrics for synthetic clinical data. JSD = Jensen-Shannon Divergence (lower is better). MIA = Membership Inference Attack.
The 38.2% utility gain is the standout result. It means a machine learning model trained on DISCO-TAB's synthetic data performs nearly as well as one trained on real, sensitive patient data for tasks like predicting disease progression or treatment outcome. The near-zero JSD score indicates the synthetic data's joint probability distribution is virtually indistinguishable from the real data's.
How It Works: RL Fine-Tuning an LLM for Tabular Generation
The process begins by serializing tabular EHR data into a textual sequence suitable for an LLM (e.g., "Age: 45, Sex: M, Diagnosis: HFrEF, BP: 120/80..."). A base LLM (architecture not specified in the preprint) is fine-tuned on this serialized data.
The core innovation is the training loop. The fine-tuned LLM acts as a policy network in an RL setup. It generates a synthetic record, which is then evaluated by the hierarchical discriminator. The discriminator produces four separate reward signals corresponding to its four evaluation layers. These rewards are combined, shaped by the inverse-frequency weighting, and used to compute a policy gradient update for the LLM.
The Automated Constraint Discovery module likely uses rule-mining or causal discovery techniques on the training data to identify hard dependencies. These constraints are then integrated into the reward function, penalizing the LLM for generating records that violate them. Over many iterations, the LLM learns to generate data that is not only statistically accurate but also clinically coherent.
Why It Matters: A New Standard for Trustworthy Synthetic Data
For AI practitioners in healthcare and other sensitive domains, DISCO-TAB represents a significant methodological advance. The hierarchical feedback mechanism provides a more directable and effective training signal for generative models than prior approaches. The demonstrated resistance to membership inference attacks is critical for real-world deployment, where synthetic data must not leak information about individuals in the original training set.
The work directly tackles the utility-privacy trade-off that has plagued synthetic data generation. Previous methods often sacrificed one for the other: perfectly private data might be useless for training, and highly useful data might risk privacy. DISCO-TAB's results suggest it can push the Pareto frontier, offering high utility without compromising privacy or statistical fidelity.
gentic.news Analysis
This paper arrives amid a significant weekly trend of LLM-related research on arXiv, with LLMs appearing in 16 articles this week alone. It represents a sophisticated fusion of two dominant technical threads we track: the application of large language models to structured data problems and the use of reinforcement learning for fine-grained control. The hierarchical reward structure is reminiscent of techniques used in training AI agents for complex tasks, now applied to the "agent" of a data-generating LLM.
The focus on small-sample, high-dimensional data is particularly relevant for the medical field, where collecting large datasets is often ethically and practically impossible. This work complements other recent arXiv studies we've covered that challenge assumptions in AI evaluation, such as the vulnerability of RAG systems to gaming. DISCO-TAB's rigorous benchmarking against utility, fidelity, and privacy sets a high standard for future work in synthetic data generation.
Notably, the paper's approach of using an LLM as a foundational generator aligns with the broader industry shift towards using foundation models for diverse downstream tasks. However, it adds a crucial layer of discriminator-guided refinement, moving beyond simple fine-tuning. This hybrid approach—leveraging the generative capacity of LLMs with the precise, objective-driven optimization of RL—is likely to be replicated in other domains where data generation must adhere to strict logical or regulatory constraints, such as finance or legal document synthesis.
Frequently Asked Questions
What is DISCO-TAB used for?
DISCO-TAB is a framework for generating synthetic tabular data, specifically designed for sensitive domains like healthcare. It creates artificial Electronic Health Records (EHRs) that mimic the statistical patterns and clinical logic of real patient data but contain no real patient information, enabling research and model development without privacy violations.
How does DISCO-TAB improve upon previous synthetic data methods?
Previous methods like GANs or diffusion models often provide only a single, overall score for generated data. DISCO-TAB introduces a four-level hierarchical discriminator (token, sentence, feature, row) that gives much more detailed feedback during training. It also automatically discovers medical constraints from the data and uses inverse-frequency rewards to prevent under-representation of rare conditions, leading to more clinically valid and useful synthetic datasets.
Is synthetic data from DISCO-TAB completely private?
The paper demonstrates that DISCO-TAB synthetic data shows "robust resistance to membership inference attacks," a common method for detecting if a specific individual's data was in the training set. While no synthetic data method can guarantee absolute privacy, this strong resistance makes it suitable for many real-world applications where data protection is paramount.
What kind of performance gain does DISCO-TAB offer?
In downstream tasks—like training a classifier to predict a medical outcome—models trained on DISCO-TAB synthetic data achieved performance up to 38.2% better (in F1-score) than models trained on data from previous state-of-the-art methods like GANs. The synthetic data also achieved near-perfect statistical fidelity, with a Jensen-Shannon Divergence score of less than 0.01 compared to the real data.







