sre

30 articles about sre in AI news

Turn Claude Code Into an AI SRE

Five proven outer-loop workflows for using Claude Code as an AI SRE: incident triage, runbook execution, postmortem drafting, SLO investigation, and on-call handoffs. The bottleneck isn't the model — it's the MCP runtime.

Apr 22, 2026100% relevant

MiniMax Launches MaxHermes, Cloud-Hosted Agent with NousResearch

MiniMax has launched MaxHermes, a cloud-hosted version of the Hermes agent framework, in partnership with NousResearch. This provides a managed service for users of MiniMax's M2.7 model, aiming to simplify agent deployment.

Apr 16, 202685% relevant

SLSREC: A New Self-Supervised Model for Disentangling Long- and Short-Term User Interests in Recommendations

A new arXiv preprint introduces SLSREC, a self-supervised model that disentangles long-term user preferences from short-term intentions using contrastive learning and adaptive fusion. It outperforms state-of-the-art models on three benchmark datasets, addressing a core challenge in dynamic user modeling.

Apr 7, 202688% relevant

How /grill-me Prevents the #1 Agentic Coding Failure: Building the Wrong Thing

Install Florian's Claude Code Kit and run `/grill-me` before non-trivial tasks. This guardrail interviews you one question at a time, forcing alignment before any code is written — catching misread requirements at their cheapest point.

Jun 23, 202693% relevant

New Research Reveals LLM-Based Recommender Agents Are Vulnerable to Contextual Bias

A new benchmark, BiasRecBench, demonstrates that LLMs used as recommendation agents in workflows like e-commerce are easily swayed by injected contextual biases, even when they can identify the correct choice. This exposes a critical reliability gap in high-stakes applications.

Mar 19, 202682% relevant

Viral AI Creativity Study Misinterpreted: Research Shows No Long-Term Decline in Creative Output

A viral social media post misrepresented findings from an AI creativity study, claiming ChatGPT use reduces creativity over time. The actual research found no significant drop after 30 days, with AI-assisted groups maintaining higher creative output than controls.

Mar 8, 202685% relevant

Decoy Font Tricks AI Vision Models With Dual-Layer Glyphs

Mixfont's Decoy Font hides text from AI vision models by layering two characters into one glyph, exploiting a tokenization blind spot in ChatGPT and Gemini.

Jul 23, 202665% relevant

Build an MCP-Powered SaaS Discovery Engine with Next.js and PostgreSQL

Build an AI-ready SaaS directory by designing a canonical product entity first, then exposing it via MCP. Use Prisma, Zod, and JSON-LD to serve humans, search engines, and AI agents from one source of truth.

Jul 19, 202680% relevant

Databricks Tests Coding Agents on Its Own Codebase

Databricks benchmarked coding agents on its own polyglot codebase. GLM-5.2 matched top closed models, a minimal harness halved costs, and cheaper-per-token models cost more per task.

Jul 11, 202675% relevant

7 Breaking Changes in the 2026-07-28 MCP Spec: Your Before/After Migration Guide

The 2026-07-28 MCP spec removes sessions and the initialize handshake. Run these 7 greps against your src/ to find every breaking change, then migrate in order: sessions first, then handshake, then error codes.

Jul 7, 202695% relevant

GitHub Actions Now Runs Steps in Parallel — Here's How to Use It with

GitHub Actions' new `background`, `wait`, `cancel`, and `parallel` keywords let you run steps concurrently. Update your CI/CD workflows to cut job times.

Jun 25, 202670% relevant

LOCUS-v1: 2.2M US Laws Hit HuggingFace via AI Pipeline

LOCUS-v1, a dataset of 2.2M US laws built via AI pipeline, released on HuggingFace. First comprehensive legal database of its kind, but quality and validation metrics remain undisclosed.

Jun 21, 202689% relevant

Anthropic Reverses Claude Agent SDK Billing Overhaul Before Launch

Anthropic paused its June 15 billing overhaul for the Claude Agent SDK, keeping usage within regular subscription limits, amid a brewing price war with OpenAI and its own upcoming IPO.

Jun 16, 202695% relevant

Oracle Ships Full-Stack DR MCP Server for OCI

Oracle launched an MCP server for OCI Full Stack DR, enabling AI agents to automate recovery operations. First major cloud DR vendor on the protocol.

Jun 10, 202670% relevant

MCP Crosses 9,400 Servers; Build Your Own in TypeScript

MCP crossed 9,400 servers. Build a database introspection server in TypeScript. SDK handles protocol framing and capability negotiation.

May 21, 202690% relevant

VAB Benchmark: Top MLLMs Judge Beauty Correctly Only 26.5% of Time

Frontier MLLMs achieve only 26.5% accuracy on VAB, far below human 68.9%. Fine-tuning bridges the gap.

May 14, 202660% relevant

Claude Code Plugin Deploys 17-Agent SDLC Team With Orchestrator

Team-of-agents plugin adds 17 specialist AI agents with an orchestrator to Claude Code, using confidence signals to gate output quality.

May 12, 202692% relevant

Retail traffic from LLMs surged 393% year-on-year, reports CX Network

According to CX Network, retail traffic originating from large language model interfaces increased 393% year-on-year, highlighting the growing role of conversational AI as a customer acquisition channel for retailers.

Apr 24, 202686% relevant

New MoE Framework Tames User Interest Shifts in Long-Sequence Recommendations

Researchers propose MoS, a model-agnostic MoE approach that handles long user sequences by detecting session hopping – where user interests shift across sessions. The theme-aware routing mechanism filters irrelevant sessions, while multi-scale fusion captures global and local patterns. Results show SOTA on benchmarks with fewer FLOPs than alternatives.

Apr 24, 202694% relevant

OpenCLAW-P2P v6.0 Cuts Paper Lookup Latency to <50ms

OpenCLAW-P2P v6.0 introduces a multi-layer persistence architecture and live reference verification, reducing paper retrieval latency from >3s to <50ms and operating with 14 autonomous agents that scored 50+ papers.

Apr 23, 202677% relevant

TF-LLMER: A New Framework to Fix Optimization Problems in LLM-Enhanced

Researchers identify two key causes of poor training in LLM-enhanced recommenders: norm disparity and misaligned angular clustering. Their solution, TF-LLMER, uses embedding normalization and Rec-PCA to significantly outperform existing methods.

Apr 23, 202674% relevant

PerfectSquashBench Tests Image Model Anchoring Bias vs. Text Models

Wharton professor Ethan Mollick released PerfectSquashBench, a test showing image generation models exhibit stronger anchoring bias than text models, getting 'stuck' on initial directions and requiring context window clearing.

Apr 22, 202685% relevant

LLMAR: A Tuning-Free LLM Framework for Recommendation in Sparse

Researchers propose LLMAR, a tuning-free recommendation framework that uses LLM reasoning to infer user 'latent motives' from sparse text-rich data. It outperforms state-of-the-art models in sparse industrial scenarios while keeping inference costs low, offering a practical alternative to costly fine-tuning.

Apr 21, 202680% relevant

The Hidden Cost of AI Translation Layers in Global Customer Support

An article argues that using a basic translation layer for multilingual AI customer support is a costly mistake. It fails to convey cultural context and appropriate tone, leading to higher churn and lower satisfaction in non-English markets. The solution requires treating multilingual support as a core operational capability, not just a technical add-on.

Apr 16, 202694% relevant

Interluxe Group Launches Optima AI Index to Shape Luxury Discovery in

The Interluxe Group has introduced the Optima AI Index, a new data standard aimed at enhancing the accuracy and visibility of luxury brand information within generative AI platforms. This initiative seeks to address the challenge of inconsistent brand discovery in AI-driven search, providing a structured foundation for brand representation.

Apr 16, 202696% relevant

Claude MCP GPU Debugging: AI Agent Identifies PyTorch Bottleneck in Kernel

A developer used an AI agent powered by Claude Code and the Model Context Protocol (MCP) to diagnose a severe GPU performance bottleneck. The agent analyzed system kernel traces, pinpointing excessive CPU context switches as the culprit, demonstrating a practical application of agentic AI for complex technical debugging.

Apr 16, 202672% relevant

RoTE: A New Plug-and-Play Module to Sharpen Time-Aware Sequential

A new research paper introduces RoTE, a multi-level temporal embedding module for sequential recommenders. It explicitly models the time spans between user interactions, a factor often overlooked, leading to significant performance gains on standard benchmarks.

Apr 16, 202682% relevant

MVCrec: A New Multi-View Contrastive Learning Framework for Sequential

Researchers propose MVCrec, a framework that applies multi-view contrastive learning between sequential (ID-based) and graph-based views of user interaction data to improve recommendation accuracy. It outperforms 11 leading models, showing significant gains in key metrics.

Apr 16, 202684% relevant

New Research Proposes DITaR Method to Defend Sequential Recommenders

Researchers propose DITaR, a dual-view method to detect and rectify harmful fake orders embedded in user sequences. It aims to protect recommendation integrity while preserving useful data, showing superior performance in experiments. This addresses a critical vulnerability in e-commerce and retail AI systems.

Apr 13, 202686% relevant

Waymo Data Claims Autonomous Tech Prevents Injuries, Deaths

Waymo has released data indicating its autonomous vehicle technology is preventing injuries and deaths on public roads. If verified, this represents a critical, evidence-based argument for the safety of robotaxis.

Apr 12, 202675% relevant

Explore More

AI Agents Large Language Models Claude Code OpenAI RAG MCP Fine-tuning Benchmarks Open Source AI AI Safety