datasets
30 articles about datasets in AI news
Tencent Launches 2025 Ad Algorithm Challenge with Massive All-Modality Recommendation Datasets
Tencent has launched an open competition and released two industrial-scale datasets (TencentGR-1M and TencentGR-10M) to advance generative recommender systems. This has spurred related research into debiasing techniques and novel reranking frameworks, moving the field toward more holistic, multi-modal user modeling.
QUMPHY Project's D4 Report Establishes Six Benchmark Problems and Datasets for ML on PPG Signals
A new report from the EU-funded QUMPHY project establishes six benchmark problems and associated datasets for evaluating machine and deep learning methods on photoplethysmography (PPG) signals. This standardization effort is a foundational step for quantifying uncertainty in medical AI applications.
DIET: A New Framework for Continually Distilling Streaming Datasets in Recommender Systems
Researchers propose DIET, a framework for streaming dataset distillation in recommender systems. It maintains a compact, evolving dataset (1-2% of original size) that preserves training-critical signals, reducing model iteration costs by up to 60x while maintaining performance trends.
Federated Fine-Tuning Benchmark Shows QLoRA Nears Centralized Accuracy on
Sherpa.ai's arXiv benchmark shows federated fine-tuning with QLoRA matches centralized accuracy on four healthcare and finance datasets, outperforming isolated single-institution learning under non-IID conditions.
LASAR Cuts Latent Reasoning Steps in Half for GenRec at 20x Speedup Over CoT
LASAR nearly halves latent reasoning steps and achieves 20x speedup over explicit CoT in generative recommendation, outperforming baselines on three datasets.
Simple Graph Heuristic Beats Generative Recommenders on 10 of 14 Benchmarks
A no-training graph heuristic beats generative recommenders on 10 of 14 benchmarks, exposing shortcut-solvable datasets. Relative NDCG@10 gains hit 44% on Amazon CDs.
RedParrot: Semantic Caching Speeds Up NL-to-DSL for Business Analytics by
Xiaohongshu researchers propose RedParrot, a framework that caches normalized structural patterns of natural language queries to bypass expensive LLM pipelines, achieving 3.6x speedup and 8.26% accuracy improvement on enterprise datasets.
AFMRL: Using MLLMs to Generate Attributes for Better Product Retrieval in
AFMRL uses MLLMs to generate product attributes, then uses those attributes to train better multimodal representations for e-commerce retrieval. Achieves SOTA on large-scale datasets.
LangFuse on Evaluating AI Agents in Production
The article outlines a practical methodology for monitoring and enhancing AI agent performance post-deployment. It emphasizes combining automated LLM-based evaluation with human feedback loops to create actionable datasets for fine-tuning.
Apple Releases DFNDR-12M Dataset, Claims 5x CLIP Training Efficiency
Apple has open-sourced DFNDR-12M, a multimodal dataset of 12.8 million image-text pairs with synthetic captions and pre-computed embeddings. The company claims it enables up to 5x training efficiency over standard CLIP datasets.
OpenAI Engineer Processed 210B Tokens, Sparking AI Efficiency Debate
An OpenAI engineer processed 210 billion tokens in one week, equivalent to 33 Wikipedia-sized datasets. This extreme usage spotlights a growing trend where high AI consumption by engineers leads to a 10x cost increase and a high volume of discarded code.
OpenVoice v2: Complete Voice Cloning Directory Launches on GitHub
A developer has compiled and released a comprehensive directory of open-source voice cloning tools and resources on GitHub. This centralizes access to models, datasets, and training code, lowering the barrier to entry for AI audio development.
Kyutai Labs Releases OVIE: Single-Image Novel View Synthesis Model
French AI lab Kyutai Labs released OVIE, a novel view generation model trained only on single images, bypassing the need for costly multi-view datasets. This could democratize 3D content creation from 2D photos.
HARPO: A New Agentic Framework for Conversational Recommendation Aims to
A new research paper introduces HARPO, a hierarchical agentic reasoning framework for conversational recommender systems. It reframes recommendation as a structured decision-making process, directly optimizing for interpretable quality dimensions like relevance, diversity, and predicted satisfaction. The approach shows consistent improvements on recommendation-centric metrics across three datasets.
New Research Establishes State-of-the-Art for Virtual Try-Off with
A new arXiv paper introduces a systematic framework for Virtual Try-Off (VTOFF)—reconstructing a garment's canonical form from a worn image. The Dual-UNet Diffusion model achieves state-of-the-art results on standard datasets, providing foundational insights for this emerging computer vision task.
Ensembles at Any Cost? New Research Quantifies Accuracy-Energy Trade-offs
A comprehensive study of 93 experiments across four datasets reveals the severe energy inefficiency of ensemble methods in recommender systems. While accuracy improves slightly, energy consumption and CO2 emissions can increase by orders of magnitude, forcing a critical cost-benefit analysis for production systems.
Beyond Dense Connectivity: Explicit Sparsity for Scalable Recommendation
A new arXiv paper introduces SSR, a framework that builds explicit sparsity into recommendation model architectures. It addresses the inefficiency of dense models (like MLPs) when processing high-dimensional, sparse user data, showing superior performance and scalability on datasets including AliExpress.
CoDiS: A Causal Framework for Cross-Domain Sequential Recommendation
A new arXiv paper introduces CoDiS, a framework for Cross-Domain Sequential Recommendation that uses causal inference to disentangle domain-shared and domain-specific user preferences while addressing context confounding and gradient conflicts. It outperforms state-of-the-art baselines on three real-world datasets.
New arXiv Study Finds No Saturation Point for Data in Traditional Recommender Systems
A new arXiv preprint systematically tests how recommendation model performance scales with training data size. Using 10 algorithm variants across 11 large datasets, the research finds that normalized performance (NDCG@10) generally keeps improving up to 100 million interactions, with no clear saturation point for typical models.
SLSREC: A New Self-Supervised Model for Disentangling Long- and Short-Term User Interests in Recommendations
A new arXiv preprint introduces SLSREC, a self-supervised model that disentangles long-term user preferences from short-term intentions using contrastive learning and adaptive fusion. It outperforms state-of-the-art models on three benchmark datasets, addressing a core challenge in dynamic user modeling.
AgenticGEO: Self-Evolving AI Framework for Generative Search Engine Optimization Outperforms 14 Baselines
Researchers propose AgenticGEO, an AI framework that evolves content strategies to maximize inclusion in generative search engine outputs. It uses MAP-Elites and a Co-Evolving Critic to reduce costly API calls, achieving state-of-the-art performance across 3 datasets.
MIPO: A Novel Self-Improvement Method for LLMs That Enhances Personalization Without New Data
Researchers propose Mutual Information Preference Optimization (MIPO), a contrastive data augmentation technique that improves LLM personalization by 3-40% on real-user datasets without requiring additional labeled data or human supervision.
Visual Product Search Benchmark: A Rigorous Evaluation of Embedding Models for Industrial and Retail Applications
A new benchmark evaluates modern visual embedding models for exact product identification from images. It tests models on realistic industrial and retail datasets, providing crucial insights for deploying reliable visual search systems where errors are costly.
HuggingFace Launches Daily Papers SKILL.md for AI Agents to Read, Search, and Fetch Research Papers
HuggingFace released Daily Papers SKILL.md, a tool enabling AI agents to read paper content as markdown, search papers, find linked models/datasets, and fetch papers via API.
ReFORM: A New LLM Framework for Multi-Factor Recommendation from User Reviews
Researchers propose ReFORM, a novel recommendation framework that uses LLMs to generate factor-specific user and item profiles from reviews, then applies multi-factor attention to personalize suggestions. It outperforms state-of-the-art baselines on restaurant datasets, offering a more nuanced approach to personalization.
Unsloth Studio: Open-Source Web App Cuts VRAM Usage for Local LLM Training and Dataset Creation
Unsloth has launched Unsloth Studio, an open-source web application that enables users to run, train, compare, and export hundreds of LLMs locally with significantly reduced VRAM consumption. It also converts files like PDFs, CSVs, and DOCXs into training datasets.
A Counterfactual Approach for Addressing Individual User Unfairness in Collaborative Recommender Systems
New arXiv paper proposes a dual-step method to identify and mitigate individual user unfairness in collaborative filtering systems. It uses counterfactual perturbations to improve embeddings for underserved users, validated on retail datasets like Amazon Beauty.
AI Learns Like Humans: New System Trains Language Models Through Everyday Conversations
Researchers have developed a breakthrough system that enables language models to learn continuously from everyday conversations rather than static datasets. This approach mimics human learning patterns and could revolutionize how AI systems acquire and update knowledge.
Anthropic's Pricing Revolution: Million-Token Context Now Standard for Claude AI
Anthropic has eliminated the 5x surcharge for million-token contexts in Claude 3 Opus and Claude 3.5 Sonnet, making long-context AI dramatically more affordable. This pricing overhaul removes barriers for developers analyzing large documents, codebases, and datasets.
Google's Groundsource: Using AI to Mine Historical Disaster Data from Global News
Google AI Research has unveiled Groundsource, a novel methodology using the Gemini model to transform unstructured global news reports into structured historical datasets. The system addresses critical data gaps in disaster management, starting with 2.6 million urban flash flood events.