best practices
30 articles about best practices in AI news
Claude Code 2.5: New CLI, Dashboard, and Best Practices for Web Devs
Anthropic's latest Claude Code update adds a CLI, usage dashboard, and web-focused best practices. Here's how to use them.
How the New Claude Certified Architect Exam Reveals Best Practices for Claude Code
Anthropic's new certification exam outlines the core principles for effectively using Claude in development, which you can apply directly to your Claude Code workflow.
CLAUDE.md Explained: How Anthropic's Agent Memory Works
CLAUDE.md is Anthropic's project config file for Claude Code, now two years old with settled best practices for agent memory and context.
NVIDIA and Unsloth Release Comprehensive Guide to Building RL Environments from Scratch
NVIDIA and Unsloth have published a detailed practical guide on constructing reinforcement learning environments from the ground up. The guide addresses critical gaps often overlooked in tutorials, covering environment design, when RL outperforms supervised fine-tuning, and best practices for verifiable rewards.
Evaluating AI Agents in Practice: Benchmarks, Frameworks, and Lessons Learned
A new report details the practical challenges and emerging best practices for evaluating AI agents in real-world applications, moving beyond simple benchmarks to assess reliability, safety, and business value.
AI Security Inst Shows Test-Time Compute Skews Frontier Evaluations
AISecInst research shows test-time compute budgets skew frontier model evaluations, challenging standard practices.
Google Hits 75% AI-Generated Code, Up From 50% in Fall 2025
Google reports 75% of all new code is now AI-generated and engineer-approved, a sharp increase from 50% last fall. This indicates a massive, accelerating shift in software development practices at the tech giant.
Open-Source 'Claude Code' Dev Setup Replicates Anthropic Engineer's Workflow
A developer has reverse-engineered and published the complete Claude Code development setup used by Anthropic engineer Boris Cherny. The project is available for free on GitHub, offering a window into high-level AI-assisted programming practices.
Stop Using Elaborate Personas: Research Shows They Degrade Claude Code Output
Scientific research reveals common Claude Code prompting practices—like elaborate personas and multi-agent teams—are measurably wrong and hurt performance.
Regulators in Italy Probe Sephora, LVMH for Youth Marketing
Italian authorities are investigating LVMH and its beauty retailer Sephora for marketing practices targeting minors. This marks the first such European probe into the luxury conglomerate's youth outreach, signaling heightened regulatory scrutiny.
The Silent Data Harvest: Stanford Exposes How AI Giants Use Your Private Conversations
Stanford researchers reveal that all major AI companies—OpenAI, Google, Meta, Anthropic, Microsoft, and Amazon—train their models on user chat data by default, with minimal transparency, unclear opt-out mechanisms, and concerning practices around data retention and child privacy.
Beyond Architecture: How Training Tricks Make or Break AI Fraud Detection Systems
New research reveals that weight initialization and normalization techniques—often overlooked in AI development—are critical for graph neural networks detecting financial fraud on blockchain networks. The study shows these training practices affect different GNN architectures in dramatically different ways.
The AI Context Paradox: Why More Instructions Make Coding Agents Less Effective
ETH Zurich research reveals AI coding agents perform worse with overly detailed AGENTS.md files. The study shows excessive context creates 'obedient failure' where agents follow unnecessary instructions instead of solving problems efficiently. This challenges current industry practices for configuring AI development assistants.
Pruning LLMs for Edge Triples Bias, Perplexity Hides Damage
Pruning LLMs for edge deployment amplifies bias up to 83.7% while perplexity barely changes, revealing a paradox that undermines standard evaluation practices.
Why Production AI Needs More Than Benchmark Scores
The article argues that high benchmark scores are insufficient for production AI success, highlighting the need for robust MLOps practices, monitoring, and real-world testing—critical for retail applications.
Claude Desktop's Undisclosed Native Messaging Bridge
Claude Desktop installs a preauthorized native messaging bridge for browser extensions without explicit disclosure, impacting developer workflows and security practices.
Claude Code Best Practice Repo Hits 19.7K Stars with 84 Anthropic Tips
A GitHub repository called 'claude-code-best-practice' has amassed 19.7K stars by compiling 84 production tips from Anthropic's Claude Code creators. It provides a full open-source framework for moving from basic usage to advanced agentic workflows.
SpaceXAI Partners with Cursor AI to Build 'World's Best' Coding Assistant
SpaceXAI and Cursor AI announced a partnership to integrate SpaceX's engineering data with Cursor's editor, aiming to create a top-tier AI for coding and knowledge work.
Why the Best Generative AI Projects Start With the Most Powerful Model —
The article suggests that while initial AI projects leverage the broad capabilities of large foundation models, the most successful implementations eventually transition to smaller, more targeted systems. This reflects a maturation from experimentation to production optimization.
Google Cloud's Vertex AI Experiments Solves the 'Lost Model' Problem in ML Development
A Google Cloud team recounts losing their best-performing model after training 47 versions, highlighting a common MLops failure. They detail how Vertex AI Experiments provides systematic tracking to prevent this.
The Persistence Paradox: Why Safety Training Sticks in AI Agents Even When You Try to Make Them More Helpful
New research reveals that safety training in AI agents persists through subsequent helpfulness optimization, creating a linear trade-off frontier rather than achieving 'best of both worlds' outcomes. This challenges assumptions about how to balance safety and capability in multi-step AI systems.
Claude Code Digest — Jul 01–Jul 04
Agentic coding is no longer “cheap experimentation”: Lovable burned $85K in tokens, and the real bill came from debugging, not generation.
OpenAI Offers Washington 5% of $852B Valuation to Ease AI Pressure
OpenAI proposed 5% of its $852B business to Washington to ease AI regulatory pressure, per @rohanpaul_ai. The equity-for-peace swap could set a precedent.
DART: One-Shot Robot Adaptation via Weight Space Arithmetic
DART from Seoul National University adapts robot policies with one demonstration using weight space arithmetic, achieving 73% success on unseen domain shifts.
Claude Code Digest — Jun 28–Jul 01
Claude Code’s biggest shift this week: teams are replacing “let the model figure it out” with hard guardrails, and one pair of Bash hooks cut an Anthropic bill from $312 to $156.
Claude Code Digest — Jun 25–Jun 28
Claude Code’s biggest edge this week wasn’t a new model — it was learning that its harness can veto tool calls, fake tool results can be detected, and MCP servers are becoming the default way to wire in real systems.
Gemini 3.5 Flash Scores 78.4 on OSWorld, Matching GPT-5.5
Google integrated Computer Use into Gemini 3.5 Flash, scoring 78.4 on OSWorld — matching GPT-5.5 and undercutting on cost.
Claude Code Digest — Jun 20–Jun 23
Claude Code is shifting from a chat box into governed infrastructure: the teams pulling ahead are wiring policies, auth, and agent workflows now, not later.
Hermès Tops List of Luxury Brands in AI Search – WWD Report
WWD reports Hermès tops luxury brands in AI search visibility. A separate study warns LLMs misinterpret luxury brands, reducing their AI presence. This dual finding underscores the need for luxury houses to optimize for AI-driven discovery.
Nvidia Rubin Runs 45°C Liquid Cooling, Cuts Water Use to Near Zero
Nvidia's Rubin servers run 45°C liquid cooling, enabling 100% liquid cooling with zero fans and cutting water use from 2.6M gal/MW/year to near zero.