llm infrastructure

30 articles about llm infrastructure in AI news

Fine-Tuning an LLM on a 4GB GPU: A Practical Guide for Resource-Constrained Engineers

A Medium article provides a practical, constraint-driven guide for fine-tuning LLMs on a 4GB GPU, covering model selection, quantization, and parameter-efficient methods. This makes bespoke AI model development more accessible without high-end cloud infrastructure.

Apr 2, 2026100% relevant

Meta's Adaptive Ranking Model: A Technical Breakthrough for Efficient LLM-Scale Inference

Meta has developed a novel Adaptive Ranking Model (ARM) architecture designed to drastically reduce the computational cost of serving large-scale ranking models for ads. This represents a core infrastructure breakthrough for deploying LLM-scale models in production at massive scale.

Mar 31, 202695% relevant

OpenAI Winds Down Sora App, Reallocates Compute to Next-Gen 'Spud' LLM Development

OpenAI has completed initial development of its next major AI model, codenamed 'Spud,' and is winding down the Sora video app, which was reportedly a compute resource drain. The move reallocates critical infrastructure toward core LLM competition with Anthropic and Google.

Mar 24, 202687% relevant

Fractal Analytics Launches LLM Studio for Enterprise Domain-Specific AI

Fractal Analytics has launched LLM Studio, an enterprise platform built on NVIDIA infrastructure to help organizations build, deploy, and manage custom, domain-specific language models. It emphasizes governance, control, and moving beyond generic AI APIs.

Mar 17, 202674% relevant

Algorithmic Bridging: How Multimodal LLMs Can Enhance Existing Recommendation Systems

A new approach called 'Algorithmic Bridging' proposes combining multimodal conversational LLMs with conventional recommendation systems to boost performance while reusing existing infrastructure. This hybrid method aims to leverage the natural language understanding of LLMs without requiring full system replacement.

Mar 15, 202695% relevant

Cascaded LLMs Lift E-Commerce Cart Adds 2.7% in Online Test

A cascaded LLM framework for e-commerce storefront generation lifted cart adds by +2.7% in online tests, using teacher-student fine-tuning to approach closed-weight LLM quality at production latency.

May 18, 2026100% relevant

K-CARE: A New Framework Grounds LLMs in External Knowledge to Fix

K-CARE combines Symmetrical Contextual Anchoring (behavior data) and Analogical Prototype Reasoning (expert examples) to resolve e-commerce search relevance issues that pure LLM reasoning can't fix. Proven in offline and online A/B tests on a leading platform.

Apr 29, 202694% relevant

LLM-Based Customer Digital Twins Predict Preferences with 87.7% Accuracy

A new arXiv paper proposes using LLM-based 'customer digital twins' (CDTs) — agents built from individual Reddit review histories via RAG — to perform conjoint analysis. The CDTs predict actual user preferences with 87.73% accuracy in a computer monitor case study, offering a scalable alternative to traditional market research.

Apr 28, 202680% relevant

LLM-as-a-Judge Framework Fixes Math Evaluation Failures

Researchers propose an LLM-as-a-judge framework for evaluating math reasoning that beats rule-based symbolic comparison, fixing failures in Lighteval and SimpleRL. This enables more accurate benchmarking of LLM math abilities.

Apr 27, 202682% relevant

The Developer's Guide to Finetuning LLMs

A developer-focused article outlines decision frameworks for LLM finetuning—covering when it's worth the cost, how to approach it, and key trade-offs. For retail leaders, this is a practical primer on customizing models for brand-specific tasks.

Apr 24, 202690% relevant

AFMRL: Using MLLMs to Generate Attributes for Better Product Retrieval in

AFMRL uses MLLMs to generate product attributes, then uses those attributes to train better multimodal representations for e-commerce retrieval. Achieves SOTA on large-scale datasets.

Apr 23, 202684% relevant

ItemRAG: A New RAG Approach for LLM-Based Recommendation That Retrieves

ItemRAG shifts RAG for LLM-based recommenders from user-history retrieval to fine-grained item-level retrieval, using co-purchase and semantic data to prioritize informative items. Experiments show consistent outperformance over existing methods, especially for cold-start items.

Apr 23, 202686% relevant

From DIY to MLflow: A Developer's Journey Building an LLM Tracing System

A technical blog details the experience of creating a custom tracing system for LLM applications using FastAPI and Ollama, then migrating to MLflow Tracing. The author discusses practical challenges with spans, traces, and debugging before concluding that established MLOps tools offer better production readiness.

Apr 23, 202684% relevant

Bull Delivers HPC Infrastructure to Power Mimer AI Factory

Bull, a subsidiary of Atos, has supplied the core HPC infrastructure for Mimer's new AI factory. This facility is dedicated to training and developing large language models for the European market.

Apr 21, 202682% relevant

Prefill-as-a-Service Paper Claims to Decouple LLM Inference Bottleneck

A research paper proposes a 'Prefill-as-a-Service' architecture to separate the heavy prefill computation from the lighter decoding phase in LLM inference. This could enable new deployment models where resource-constrained devices handle only the decoding step.

Apr 20, 202685% relevant

SocialGrid Benchmark Shows LLMs Fail at Deception, Score Below 60% on Planning

Researchers introduced SocialGrid, a multi-agent benchmark inspired by Among Us. It shows state-of-the-art LLMs fail at deception detection and task planning, scoring below 60% accuracy.

Apr 20, 2026100% relevant

BERT-as-a-Judge Matches LLM-as-a-Judge Performance at Fraction of Cost

Researchers propose 'BERT-as-a-Judge,' a lightweight evaluation method that matches the performance of costly LLM-as-a-Judge setups. This could drastically reduce the cost of automated LLM evaluation pipelines.

Apr 19, 202685% relevant

OpenAI Open-Sources Agents SDK, Supports 100+ LLMs

OpenAI has open-sourced its internal Agents SDK, a lightweight framework for building multi-agent systems. It features three core primitives, works with over 100 LLMs, and has gained 18.9k GitHub stars immediately.

Apr 18, 202695% relevant

GeoAgentBench: New Dynamic Benchmark Tests LLM Agents on 117 GIS Tools

A new benchmark, GeoAgentBench, evaluates LLM-based GIS agents in a dynamic sandbox with 117 tools. It introduces a novel Plan-and-React agent architecture that outperforms existing frameworks in multi-step spatial tasks.

Apr 17, 202694% relevant

Nvidia: Cost Per Token Is the Only AI Infrastructure Metric That Matters

Nvidia asserts that total cost of ownership for AI infrastructure must be measured in cost per delivered token, not raw compute metrics. This shift is critical for scaling profitable agentic AI applications.

Apr 15, 202680% relevant

LLM-HYPER: A Training-Free Framework for Cold-Start Ad CTR Prediction

A new arXiv paper introduces LLM-HYPER, a framework that treats large language models as hypernetworks to generate parameters for click-through rate estimators in a training-free manner. It uses multimodal ad content and few-shot prompting to infer feature weights, drastically reducing the cold-start period for new promotional ads and has been deployed on a major U.S. e-commerce platform.

Apr 15, 202696% relevant

Meta Expands Broadcom Partnership for Next-Gen AI Infrastructure

Meta is expanding its partnership with semiconductor giant Broadcom to co-develop its next-generation AI infrastructure. This move signals a continued, long-term commitment to custom silicon for AI training and inference.

Apr 14, 202685% relevant

Fine-Tuning vs RAG: Clarifying the Core Distinction in LLM Application Design

The source article aims to dispel confusion by explaining that fine-tuning modifies a model's knowledge and behavior, while RAG provides it with external, up-to-date information. Choosing the right approach is foundational for any production LLM application.

Apr 14, 202697% relevant

ReRec: A New Reinforcement Fine-Tuning Framework for Complex LLM-Based

A new paper introduces ReRec, a reinforcement fine-tuning framework designed to enhance LLMs' reasoning capabilities for complex recommendation tasks. It uses specialized reward shaping and curriculum learning to improve performance while preserving the model's general abilities. This addresses a key weakness in using off-the-shelf LLMs for sophisticated personalization.

Apr 10, 202680% relevant

Flipkart Appoints Hemant Badri to Lead AI Execution, Rebuilds Infrastructure

Flipkart is restructuring to prioritize AI execution, appointing Hemant Badri to lead operational AI and launching the OneTech project to rebuild core infrastructure. This move highlights a broader enterprise trend where competitive advantage now stems from integration, not just model access.

Apr 9, 202685% relevant

Visa Launches Global AI Agent Shopping Infrastructure

Visa is launching a global infrastructure to enable AI agents to shop and transact autonomously. This move, alongside reports of a 25% conversion uplift from Frasers Group's AI assistant, signals the acceleration of 'agentic commerce'.

Apr 8, 202680% relevant

Sipeed Launches PicoClaw, Open-Source Alternative to OpenClaw for LLM Orchestration

Sipeed, known for its AI hardware, has open-sourced PicoClaw, a framework for orchestrating multiple LLMs across different channels. This provides a direct, community-driven alternative to the popular OpenClaw project.

Apr 8, 202675% relevant

Agent Harness Engineering: The 'OS' That Makes LLMs Useful

A clear analogy frames raw LLMs as CPUs needing an operating system. The agent harness—managing tools, memory, and execution—is what creates useful applications, as proven by LangChain's benchmark jump.

Apr 7, 202685% relevant

Stanford Releases Free LLM & Transformer Cheatsheets Covering LoRA, RAG, MoE

Stanford University has released a free, open-source collection of cheatsheets covering core LLM concepts from self-attention to RAG and LoRA. This provides a consolidated technical reference for engineers and researchers.

Apr 6, 202691% relevant

A Practical Guide to Fine-Tuning Open-Source LLMs for AI Agents

This Portuguese-language Medium article is Part 2 of a series on LLM engineering for AI agents. It provides a hands-on guide to fine-tuning an open-source model, building on a foundation of clean data and established baselines from Part 1.

Apr 6, 202674% relevant

Explore More

AI Agents Large Language Models Claude Code OpenAI RAG MCP Fine-tuning Benchmarks Open Source AI AI Safety