curation

30 articles about curation in AI news

New Paper Coins 'Curation Debt' — Benchmarks Measure Data Leakage, Not Capability

New paper coins 'curation debt' — benchmarks like MMLU measure data leakage, not capability. Proposes adversarial dynamic benchmarks.

May 16, 202685% relevant

X Launches Custom Timelines, AI-Powered Feed Curation Tool

X has launched 'Custom Timelines,' a feature that uses AI to let users create and follow personalized feeds based on curated lists of accounts, moving beyond the main algorithmic 'For You' feed.

Apr 21, 202685% relevant

MeiGen Revolutionizes AI Art Creation with Automated Prompt Curation

MeiGen, a new open-source tool, automatically scrapes and curates trending AI image prompts from social media, solving the problem of prompt discovery and organization for digital artists. The free platform aggregates weekly collections without requiring manual bookmarking or searching.

Feb 27, 202685% relevant

Pioneer Agent: A Closed-Loop System for Automating Small Language Model

Researchers present Pioneer Agent, a system that automates the adaptation of small language models to specific tasks. It handles data curation, failure diagnosis, and iterative training, showing significant performance gains in benchmarks and production-style deployments. This addresses a major engineering bottleneck for deploying efficient, specialized AI.

Apr 14, 202674% relevant

AI emerges as a strategic priority for luxury as accelerating consumer use

A Bain & Company and Comité Colbert report declares AI a strategic priority for luxury brands, driven by accelerating consumer use that challenges the industry to reinvent customer discovery and experience. This matters as luxury houses face pressure to integrate AI without diluting brand exclusivity.

Jun 30, 202682% relevant

CLI-Universe: Qwen3-32B fine-tuned on 6K trajectories beats models 10x larger on Terminal-Bench 2.0

CLI-Universe synthesizes terminal-agent tasks; Qwen3-32B fine-tuned on 6K trajectories hits 33.4% on Terminal-Bench 2.0, beating models 10x larger.

Jun 27, 202687% relevant

Cursor Trains GPT-Size Model with 10-20x Compute

Cursor trained a GPT-size model from scratch with 10-20x more compute, announced at Compile. The move shifts from fine-tuning to pretraining for code generation.

Jun 21, 202691% relevant

Pareto LoRA Boosts Image Quality 44.9% vs Vanilla LoRA on Emu2

Pareto LoRA reformulates multimodal instruction tuning as bi-objective optimization, achieving up to 44.9% image quality gains on Emu2 while maintaining text performance.

Jun 17, 202690% relevant

Estonian Institute: Claude Tops Russian Propaganda Benchmark, Mistral Trails

Estonian Language Institute benchmark tests 60 AI models vs Russian propaganda. Claude tops, Mistral trails with 36.67% misinformation rate.

Jun 16, 202672% relevant

MA-ProofBench: GPT-5.5 Hits 16% on Math Analysis, Most Models Near 0%

MA-ProofBench, a new theorem-proving benchmark for mathematical analysis, shows GPT-5.5 achieving 16% on undergraduate problems and 5% on PhD-level, with most models near 0% on the harder set.

Jun 15, 202682% relevant

UniSound U2 Cuts Token Use 25%, Joins Top Chinese LLM Tier

UniSound's U2 foundation model cuts token consumption by 25% while matching top Chinese LLM performance, entering the top tier with an efficiency-first design.

Jun 9, 202671% relevant

Meesho Integrates AI-Powered Product Recommendation System

Meesho integrates an AI-powered recommendation system to personalize shopping. This matters as it shows how value e-commerce platforms adopt AI to compete with giants like Amazon and Google.

Jun 5, 2026100% relevant

NanoGPT-Bench: A New Eval for Coding Agents Doing AI Research

IntologyAI released NanoGPT-Bench, an internal eval for coding agents on an AI R&D problem. No results or task specifics have been disclosed.

May 19, 202685% relevant

Hermes Agent's Three-Tier Memory Cuts Context Bloat, Keeps 2,200-Char Core

Hermes agent's three-tier memory uses two tiny markdown files (2,200 chars), SQLite FTS5 search (10ms over 10K docs), and 8 pluggable providers. The composition solves the always-on vs. deep recall trade-off.

May 14, 202691% relevant

VAB Benchmark: Top MLLMs Judge Beauty Correctly Only 26.5% of Time

Frontier MLLMs achieve only 26.5% accuracy on VAB, far below human 68.9%. Fine-tuning bridges the gap.

May 14, 202660% relevant

Almanac: Open-Source Wiki Auto-Updates From Claude Code Chats

Almanac auto-generates a markdown wiki from Claude Code chats and repo history, solving the agent context gap. Free open-source tool, MacOS-only.

May 14, 202690% relevant

Anthropic Ships Claude Opus 4.7: 2.1% SWE-Bench Gain Over 4.6

Anthropic released Claude Opus 4.7 with a 2.1-point SWE-Bench gain to 82.9, the smallest jump between Opus versions yet, signaling diminishing returns.

May 9, 202690% relevant

Ctx2Skill: Self-Play Framework Lets LMs Discover Skills Without Labels

Ctx2Skill discovers skills from context via multi-agent self-play without labels. Outputs plug into any LM, targeting manual prompt engineering bottlenecks.

May 5, 202685% relevant

Matt Pocock Open-Sources Claude Code Skill Pack for AI Agents

Matt Pocock open-sourced a Claude Code skill pack to improve AI agent behavior. The pack provides curated prompts and configurations for Anthropic's terminal-based coding tool.

May 5, 202695% relevant

GPT-5.5 Pro Leapfrogs on Epoch Benchmark; Base Model Beats Prior Pro

A tweet from @kimmonismus reveals GPT-5.5 Pro shows significant Epoch benchmark gains, and the non-Pro GPT-5.5 surpasses GPT-5.4 Pro, suggesting major efficiency improvements at OpenAI.

Apr 29, 202699% relevant

K-CARE: A New Framework Grounds LLMs in External Knowledge to Fix

K-CARE combines Symmetrical Contextual Anchoring (behavior data) and Analogical Prototype Reasoning (expert examples) to resolve e-commerce search relevance issues that pure LLM reasoning can't fix. Proven in offline and online A/B tests on a leading platform.

Apr 29, 202694% relevant

Alec Radford's 'Talk to the Past' AI Lets You Chat with History

A new AI project by Alec Radford and David Duvenaud lets you chat with simulated historical figures.

Apr 28, 202675% relevant

Hinton Rebrands AI Hallucinations as 'Confabulations'

Geoffrey Hinton redefines AI hallucinations as 'confabulations,' arguing that intelligence reconstructs reality into plausible stories rather than storing facts like a database.

Apr 26, 202687% relevant

San Francisco Shop Runs Entirely by AI Agent

A shop in San Francisco is fully operated by an AI agent, replacing human cashiers and assistants. The concept points toward fully autonomous retail experiences, though details on the technology stack remain thin.

Apr 23, 202680% relevant

Meta's Sapiens2: 1B Human Image ViTs for Pose, Segmentation, Normals

Meta open-sourced Sapiens2 on Hugging Face, a family of vision transformers pretrained on 1 billion human images for pose estimation, segmentation, normal estimation, and point maps. The models target high-resolution human-centric perception.

Apr 23, 202692% relevant

ItemRAG: A New RAG Approach for LLM-Based Recommendation That Retrieves

ItemRAG shifts RAG for LLM-based recommenders from user-history retrieval to fine-grained item-level retrieval, using co-purchase and semantic data to prioritize informative items. Experiments show consistent outperformance over existing methods, especially for cold-start items.

Apr 23, 202686% relevant

From Checkout to Trust Layer: How Merchants Can Prepare for Agentic Commerce

The article discusses the evolution of e-commerce from simple checkout processes to a future where AI shopping agents act on behalf of consumers. It argues that success in this 'agentic commerce' era depends on merchants building a robust trust layer with data security, transparency, and reliability at its core.

Apr 22, 202696% relevant

CAST: A New Framework for Semantic-Level Complementary Recommendations

Researchers propose CAST, a sequential recommendation framework that models transitions between discrete item semantic codes (e.g., specifications) and injects LLM-verified complementary knowledge. It achieves significant performance gains by moving beyond simplistic co-purchase statistics to capture genuine complementarity.

Apr 22, 202678% relevant

VoteGCL: A Novel LLM-Augmented Framework to Combat Data Sparsity in

A new paper introduces VoteGCL, a framework that uses few-shot LLM prompting and majority voting to create high-confidence synthetic data for graph-based recommendation systems. It integrates this data via graph contrastive learning to improve accuracy and mitigate bias, outperforming existing baselines.

Apr 22, 202690% relevant

Layers on Layers — How You Can Improve Your Recommendation Systems

An IBM article critiques monolithic recommendation engines for trying to do too much with one score. It proposes a layered architecture—candidate generation, ranking, and business logic—to improve performance and adaptability. This is a direct, practical framework for engineering teams.

Apr 21, 202682% relevant

Explore More

AI Agents Large Language Models Claude Code OpenAI RAG MCP Fine-tuning Benchmarks Open Source AI AI Safety