e commerce performance

30 articles about e commerce performance in AI news

Commerce Media Leaders Are Building for an Agentic Future

eMarketer reports commerce media leaders are building AI agent infrastructure to automate ad buying and personalization. This shift could reduce manual campaign management by 40% and boost ROI by 25% for retail media networks.

Jul 6, 202684% relevant

Why Traditional Retail Metrics Break Down in Agentic Commerce

Valtech's 2026 research shows 96% of retailers face integration barriers, 48% are stuck in AI pilot purgatory, and nearly 75% can't link AI spend to metrics, as agentic commerce fragments customer journeys beyond traditional measurement frameworks.

Jun 23, 2026100% relevant

Cascaded LLMs Lift E-Commerce Cart Adds 2.7% in Online Test

A cascaded LLM framework for e-commerce storefront generation lifted cart adds by +2.7% in online tests, using teacher-student fine-tuning to approach closed-weight LLM quality at production latency.

May 18, 2026100% relevant

RiskWebWorld: A New Benchmark Exposes the Limits of AI for E-commerce Risk

Researchers introduced RiskWebWorld, a realistic benchmark for testing GUI agents on 1,513 authentic e-commerce risk management tasks. It reveals a major capability gap, showing even the best models fail over 50% of the time, highlighting the immaturity of AI for high-stakes operational automation.

Apr 17, 202692% relevant

Google Ads Details Its Data Infrastructure for AI-Powered Commerce

Google Ads has detailed the critical role of its underlying product data infrastructure in enabling 'agentic commerce'—where AI agents assist shoppers. This foundation is key to making search more natural and understanding shopper intent.

Apr 7, 202689% relevant

MOON3.0: A New Reasoning-Aware MLLM for Fine-Grained E-commerce Product Understanding

A new arXiv paper introduces MOON3.0, a multimodal large language model (MLLM) specifically architected for e-commerce. It uses a novel joint contrastive and reinforcement learning framework to explicitly model fine-grained product details from images and text, outperforming other models on a new benchmark, MBE3.0.

Apr 2, 202694% relevant

UniScale: A Co-Design Framework for Data and Model Scaling in E-commerce Search Ranking

Researchers propose UniScale, a framework that jointly optimizes data collection and model architecture for search ranking, moving beyond just scaling model parameters. It addresses diminishing returns from parameter scaling alone by creating a synergistic system for high-quality data and specialized modeling. This approach, validated on a large-scale e-commerce platform, shows significant gains in key business metrics.

Mar 26, 202695% relevant

Accenture's DaVinci Investment Signals Growing Enterprise Bet on Agentic Commerce

Accenture's strategic investment in DaVinci Commerce highlights a major consulting firm's bet that autonomous AI agents will transform enterprise commerce platforms. This follows Google's recent launch of an Agentic Sizing Protocol for retail.

Mar 25, 202690% relevant

Graph-Enhanced LLMs for E-commerce Appeal Adjudication: A Framework for Hierarchical Review

Researchers propose a graph reasoning framework that models verification actions to improve LLM-based decision-making in hierarchical review workflows. It boosts alignment with human experts from 70.8% to 96.3% in e-commerce seller appeals by preventing hallucination and enabling targeted information requests.

Mar 23, 202676% relevant

Why Agentic AI is a Game-Changer for Ecommerce

A report from Retail TouchPoints and Digital Commerce 360 highlights the rise of 'agentic commerce,' where autonomous AI agents are poised to handle complex, multi-step customer journeys. This shift is driving increased AI investment as companies anticipate agents facilitating up to 50% of online transactions by 2027.

Mar 13, 202689% relevant

Entropy-Guided Branching Boosts Agent Success 15% on New SLATE E-commerce

A new paper introduces SLATE, a large-scale benchmark for evaluating tool-using AI agents, and Entropy-Guided Branching (EGB), an algorithm that improves task success rates by 15% by dynamically expanding search where the model is uncertain.

Apr 15, 202673% relevant

FLAME: A Novel Framework for Efficient, High-Performance Sequential Recommendation

A new paper introduces FLAME, a training framework for sequential recommender systems. It uses a frozen 'anchor' network and a learnable network, combined via modular ensembles, to capture user behavior diversity efficiently. The result is a single model that performs like an ensemble but runs as fast as a single model at inference.

Apr 7, 202682% relevant

Study Reveals Which Chatbot Evaluation Metrics Actually Predict Sales in Conversational Commerce

A study on a major Chinese platform tested a 7-dimension rubric for evaluating conversational AI against real sales conversions. It found only two dimensions—Need Elicitation and Pacing Strategy—were significantly linked to sales, while others like Contextual Memory showed no association, revealing a 'composite dilution effect' in standard scoring.

Apr 2, 2026100% relevant

Ego2Web Benchmark Bridges Egocentric Video and Web Agents, Exposing Major Performance Gaps

Researchers introduce Ego2Web, the first benchmark requiring AI agents to understand real-world first-person video and execute related web tasks. Their novel Ego2WebJudge evaluation method achieves 84% human agreement, while state-of-the-art agents perform poorly across all task categories.

Mar 25, 202695% relevant

AIGQ: Taobao's End-to-End Generative Architecture for E-commerce Query Recommendation

Alibaba researchers propose AIGQ, a hybrid generative framework for pre-search query recommendations. It uses list-level fine-tuning, a novel policy optimization algorithm, and a hybrid deployment architecture to overcome traditional limitations, showing substantial online improvements on Taobao.

Mar 23, 2026100% relevant

POP.STORE Launches ECHO-ME: An Agentic AI Commerce Platform for Creators

POP.STORE announced ECHO-ME, an agentic AI platform designed to autonomously run a creator's business operations. It monitors social channels, detects brand deals, and converts fan interactions into revenue, launching with 15,000 creators. This represents a shift from task automation to full business operation for the solo creator economy.

Mar 18, 202682% relevant

Hybrid Self-evolving Structured Memory: A Breakthrough for GUI Agent Performance

Researchers propose HyMEM, a graph-based memory system for GUI agents that combines symbolic nodes with continuous embeddings. It enables multi-hop retrieval and self-evolution, boosting open-source VLMs to surpass closed-source models like GPT-4o on computer-use tasks.

Mar 12, 202672% relevant

SORT: The Transformer Breakthrough for Luxury E-commerce Ranking

SORT is an optimized Transformer architecture designed for industrial-scale product ranking. It overcomes data sparsity to deliver hyper-personalized recommendations, proven to increase orders by 6.35% and GMV by 5.47% while halving latency.

Mar 5, 202685% relevant

PayPal Cuts LLM Inference Cost 50% with EAGLE3 Speculative Decoding on H100

PayPal engineers applied EAGLE3 speculative decoding to their fine-tuned 8B-parameter commerce agent, achieving up to 49% higher throughput and 33% lower latency. This allowed a single H100 GPU to match the performance of two H100s running NVIDIA NIM, cutting inference hardware cost by 50%.

Apr 23, 202690% relevant

New Research Proposes DITaR Method to Defend Sequential Recommenders

Researchers propose DITaR, a dual-view method to detect and rectify harmful fake orders embedded in user sequences. It aims to protect recommendation integrity while preserving useful data, showing superior performance in experiments. This addresses a critical vulnerability in e-commerce and retail AI systems.

Apr 13, 202686% relevant

AWS Unveils Production Blueprint for Evaluating AI Agents with Strands and

AWS released Strands and AgentCore, a production blueprint for evaluating AI agents. It generates realistic scenarios and tracks metrics like completion rate and cost, addressing the gap between lab benchmarks and real-world performance—critical for retail AI deployments.

Jul 23, 202688% relevant

Inside Shopify Hack Days: Building a prototype for music-playing pages (2026)

Shopify's 2026 Hack Days produced a prototype for music-playing product pages, involving 150 participants over 48 hours with a 200ms load time. This explores audio commerce for merchants.

Jul 14, 2026100% relevant

Feature Freshness: The Production Bug That Makes Good Recommenders Look Bad

Jie Li's article reveals that stale features—outdated user signals—can degrade recommender performance by 20-30% in offline metrics, often misdiagnosed as model problems. The piece urges teams to prioritize feature freshness monitoring alongside model tuning.

Jul 8, 202692% relevant

Zalando Introduces MLLM-Based Evaluation for Product Retrieval

Zalando presents a multimodal LLM-based evaluation for product retrieval, aiming to enhance search relevance in e-commerce. This matters as it could set a new standard for assessing AI in retail search.

Jun 21, 202692% relevant

Counterfactual Evaluation in Ads: IPS, SNIPS, and Doubly Robust Explained

Towards AI article explains counterfactual evaluation methods (IPS, SNIPS, doubly robust) for ad ranking models. These techniques estimate model performance from logged data without A/B tests, critical for recommendation systems in retail.

Jun 3, 202698% relevant

Pretrained Audio Models Underperform in Music Recommendation, New Research Shows

A new study evaluates nine pretrained audio models for music recommendation, finding significant performance disparity between traditional MIR tasks and both hot and cold-start recommendation scenarios.

Apr 28, 202680% relevant

AFMRL: Using MLLMs to Generate Attributes for Better Product Retrieval in

AFMRL uses MLLMs to generate product attributes, then uses those attributes to train better multimodal representations for e-commerce retrieval. Achieves SOTA on large-scale datasets.

Apr 23, 202684% relevant

ItemRAG: A New RAG Approach for LLM-Based Recommendation That Retrieves

ItemRAG shifts RAG for LLM-based recommenders from user-history retrieval to fine-grained item-level retrieval, using co-purchase and semantic data to prioritize informative items. Experiments show consistent outperformance over existing methods, especially for cold-start items.

Apr 23, 202686% relevant

UniRec: A New Generative Recommendation Model Bridges the 'Expressive Gap'

A new paper introduces UniRec, a generative recommendation model that closes the performance gap with traditional discriminative models by prefixing item sequences with structured attributes like category and brand. It achieved a +22.6% improvement in offline metrics and significant online gains in CTR and GMV when deployed on Shopee.

Apr 22, 202694% relevant

CAST: A New Framework for Semantic-Level Complementary Recommendations

Researchers propose CAST, a sequential recommendation framework that models transitions between discrete item semantic codes (e.g., specifications) and injects LLM-verified complementary knowledge. It achieves significant performance gains by moving beyond simplistic co-purchase statistics to capture genuine complementarity.

Apr 22, 202678% relevant

Explore More

AI Agents Large Language Models Claude Code OpenAI RAG MCP Fine-tuning Benchmarks Open Source AI AI Safety