Skip to content
gentic.news — AI News Intelligence Platform
Connecting to the Living Graph…

sre

30 articles about sre in AI news

Turn Claude Code Into an AI SRE

Five proven outer-loop workflows for using Claude Code as an AI SRE: incident triage, runbook execution, postmortem drafting, SLO investigation, and on-call handoffs. The bottleneck isn't the model — it's the MCP runtime.

100% relevant

MiniMax Launches MaxHermes, Cloud-Hosted Agent with NousResearch

MiniMax has launched MaxHermes, a cloud-hosted version of the Hermes agent framework, in partnership with NousResearch. This provides a managed service for users of MiniMax's M2.7 model, aiming to simplify agent deployment.

85% relevant

SLSREC: A New Self-Supervised Model for Disentangling Long- and Short-Term User Interests in Recommendations

A new arXiv preprint introduces SLSREC, a self-supervised model that disentangles long-term user preferences from short-term intentions using contrastive learning and adaptive fusion. It outperforms state-of-the-art models on three benchmark datasets, addressing a core challenge in dynamic user modeling.

88% relevant

New Research Reveals LLM-Based Recommender Agents Are Vulnerable to Contextual Bias

A new benchmark, BiasRecBench, demonstrates that LLMs used as recommendation agents in workflows like e-commerce are easily swayed by injected contextual biases, even when they can identify the correct choice. This exposes a critical reliability gap in high-stakes applications.

82% relevant

Viral AI Creativity Study Misinterpreted: Research Shows No Long-Term Decline in Creative Output

A viral social media post misrepresented findings from an AI creativity study, claiming ChatGPT use reduces creativity over time. The actual research found no significant drop after 30 days, with AI-assisted groups maintaining higher creative output than controls.

85% relevant

MCP Crosses 9,400 Servers; Build Your Own in TypeScript

MCP crossed 9,400 servers. Build a database introspection server in TypeScript. SDK handles protocol framing and capability negotiation.

90% relevant

VAB Benchmark: Top MLLMs Judge Beauty Correctly Only 26.5% of Time

Frontier MLLMs achieve only 26.5% accuracy on VAB, far below human 68.9%. Fine-tuning bridges the gap.

60% relevant

Claude Code Plugin Deploys 17-Agent SDLC Team With Orchestrator

Team-of-agents plugin adds 17 specialist AI agents with an orchestrator to Claude Code, using confidence signals to gate output quality.

92% relevant

Retail traffic from LLMs surged 393% year-on-year, reports CX Network

According to CX Network, retail traffic originating from large language model interfaces increased 393% year-on-year, highlighting the growing role of conversational AI as a customer acquisition channel for retailers.

86% relevant

New MoE Framework Tames User Interest Shifts in Long-Sequence Recommendations

Researchers propose MoS, a model-agnostic MoE approach that handles long user sequences by detecting session hopping – where user interests shift across sessions. The theme-aware routing mechanism filters irrelevant sessions, while multi-scale fusion captures global and local patterns. Results show SOTA on benchmarks with fewer FLOPs than alternatives.

94% relevant

TF-LLMER: A New Framework to Fix Optimization Problems in LLM-Enhanced

Researchers identify two key causes of poor training in LLM-enhanced recommenders: norm disparity and misaligned angular clustering. Their solution, TF-LLMER, uses embedding normalization and Rec-PCA to significantly outperform existing methods.

74% relevant

OpenCLAW-P2P v6.0 Cuts Paper Lookup Latency to <50ms

OpenCLAW-P2P v6.0 introduces a multi-layer persistence architecture and live reference verification, reducing paper retrieval latency from >3s to <50ms and operating with 14 autonomous agents that scored 50+ papers.

77% relevant

PerfectSquashBench Tests Image Model Anchoring Bias vs. Text Models

Wharton professor Ethan Mollick released PerfectSquashBench, a test showing image generation models exhibit stronger anchoring bias than text models, getting 'stuck' on initial directions and requiring context window clearing.

85% relevant

LLMAR: A Tuning-Free LLM Framework for Recommendation in Sparse

Researchers propose LLMAR, a tuning-free recommendation framework that uses LLM reasoning to infer user 'latent motives' from sparse text-rich data. It outperforms state-of-the-art models in sparse industrial scenarios while keeping inference costs low, offering a practical alternative to costly fine-tuning.

80% relevant

The Hidden Cost of AI Translation Layers in Global Customer Support

An article argues that using a basic translation layer for multilingual AI customer support is a costly mistake. It fails to convey cultural context and appropriate tone, leading to higher churn and lower satisfaction in non-English markets. The solution requires treating multilingual support as a core operational capability, not just a technical add-on.

94% relevant

Interluxe Group Launches Optima AI Index to Shape Luxury Discovery in

The Interluxe Group has introduced the Optima AI Index, a new data standard aimed at enhancing the accuracy and visibility of luxury brand information within generative AI platforms. This initiative seeks to address the challenge of inconsistent brand discovery in AI-driven search, providing a structured foundation for brand representation.

96% relevant

Claude MCP GPU Debugging: AI Agent Identifies PyTorch Bottleneck in Kernel

A developer used an AI agent powered by Claude Code and the Model Context Protocol (MCP) to diagnose a severe GPU performance bottleneck. The agent analyzed system kernel traces, pinpointing excessive CPU context switches as the culprit, demonstrating a practical application of agentic AI for complex technical debugging.

72% relevant

RoTE: A New Plug-and-Play Module to Sharpen Time-Aware Sequential

A new research paper introduces RoTE, a multi-level temporal embedding module for sequential recommenders. It explicitly models the time spans between user interactions, a factor often overlooked, leading to significant performance gains on standard benchmarks.

82% relevant

MVCrec: A New Multi-View Contrastive Learning Framework for Sequential

Researchers propose MVCrec, a framework that applies multi-view contrastive learning between sequential (ID-based) and graph-based views of user interaction data to improve recommendation accuracy. It outperforms 11 leading models, showing significant gains in key metrics.

84% relevant

New Research Proposes DITaR Method to Defend Sequential Recommenders

Researchers propose DITaR, a dual-view method to detect and rectify harmful fake orders embedded in user sequences. It aims to protect recommendation integrity while preserving useful data, showing superior performance in experiments. This addresses a critical vulnerability in e-commerce and retail AI systems.

86% relevant

Waymo Data Claims Autonomous Tech Prevents Injuries, Deaths

Waymo has released data indicating its autonomous vehicle technology is preventing injuries and deaths on public roads. If verified, this represents a critical, evidence-based argument for the safety of robotaxis.

75% relevant

Pika Labs Launches 'AI Self' Chatbot for Newsletter Creator Kimmonismus

Kimmonismus, who runs an AI newsletter with 225K+ readers, has launched a custom chatbot trained on his industry knowledge and opinions using Pika Labs' technology. The 'AI Self' is designed to handle reader inquiries at scale.

85% relevant

AI-Reprogrammed Immune Cells Cure 3 Autoimmune Diseases in First Human Case

For the first time, a patient with three autoimmune diseases is in complete remission after doctors used AI to reprogram her own immune cells. This follows over a decade of requiring daily blood transfusions.

95% relevant

CoDiS: A Causal Framework for Cross-Domain Sequential Recommendation

A new arXiv paper introduces CoDiS, a framework for Cross-Domain Sequential Recommendation that uses causal inference to disentangle domain-shared and domain-specific user preferences while addressing context confounding and gradient conflicts. It outperforms state-of-the-art baselines on three real-world datasets.

82% relevant

Developer Fired After Manager Discovers Claude Code, Prefers LLM Output

A developer was fired after his manager discovered he used Claude AI to build a project, then had the AI 'vibe code' a replacement in days. The manager dismissed the developer's warnings about AI hallucinations on complex requirements.

85% relevant

New arXiv Study Finds No Saturation Point for Data in Traditional Recommender Systems

A new arXiv preprint systematically tests how recommendation model performance scales with training data size. Using 10 algorithm variants across 11 large datasets, the research finds that normalized performance (NDCG@10) generally keeps improving up to 100 million interactions, with no clear saturation point for typical models.

90% relevant

New Research: How Online Marketplaces Can Use Demand Allocation to Control Seller Inventory

Researchers propose a model where a marketplace platform, by controlling the timing and predictability of order allocation to sellers, can influence their safety-stock inventory and their choice to use platform fulfillment services. This identifies demand allocation as a key operational lever for digital marketplaces.

78% relevant

Visa Launches Global AI Agent Shopping Infrastructure

Visa is launching a global infrastructure to enable AI agents to shop and transact autonomously. This move, alongside reports of a 25% conversion uplift from Frasers Group's AI assistant, signals the acceleration of 'agentic commerce'.

80% relevant

Google Ads Details Its Data Infrastructure for AI-Powered Commerce

Google Ads has detailed the critical role of its underlying product data infrastructure in enabling 'agentic commerce'—where AI agents assist shoppers. This foundation is key to making search more natural and understanding shopper intent.

89% relevant

AlphaEarth Embeddings Outperform Prithvi, Clay in Urban Signal Benchmark

Researchers benchmarked three geospatial foundation models—AlphaEarth, Prithvi, and Clay—on predicting 14 neighborhood-level urban indicators from satellite imagery. AlphaEarth's compact 64-dimensional embeddings proved most informative, achieving the highest predictive skill for built-environment-linked outcomes like chronic health burdens.

72% relevant