prompt injection

30 articles about prompt injection in AI news

Frontier AI Models Resist Prompt Injection Attacks in Grading, New Study Finds

A new study finds that while hidden AI prompts can successfully bias older and smaller LLMs used for grading, most frontier models (GPT-4, Claude 3) are resistant. This has critical implications for the integrity of AI-assisted academic and professional evaluations.

Apr 2, 202685% relevant

How to Lock Down Claude Code After the Cowork Prompt Injection Scandal

Claude Code's new Computer Use feature expands attack surfaces. Here's how to configure permissions and audit dependencies to prevent data exfiltration.

Mar 30, 202680% relevant

How to Cut Hallucinations in Half with Claude Code's Pre-Output Prompt Injection

A Reddit user discovered a technique that forces Claude to self-audit before responding, dramatically reducing hallucinations by surfacing rules at generation time.

Mar 20, 202695% relevant

Google DeepMind: Web Environment, Not Model Weights, Is Key AI Agent Attack Surface

Google DeepMind researchers present a systematic framework showing that the web environment itself—not just the model—is a primary attack surface for AI agents. In benchmarks, hidden prompt injections hijacked agents in up to 86% of scenarios, with memory poisoning attacks exceeding 80% success.

Apr 6, 202697% relevant

OpenAI's IH-Challenge Dataset: Teaching AI to Distinguish Trusted from Untrusted Instructions

OpenAI has released IH-Challenge, a novel training dataset designed to teach AI models to prioritize trusted instructions over untrusted ones. Early results indicate significant improvements in security and defenses against prompt injection attacks, marking a step toward more reliable and controllable AI systems.

Mar 11, 202697% relevant

Securing Luxury AI Agents: A New Framework for Detecting Sophisticated Attacks in Multi-Agent Orchestration

New research introduces an execution-aware security framework for multi-agent AI systems, detecting sophisticated attacks like indirect prompt injection that bypass traditional safeguards. For luxury retailers deploying AI agents for personalization and operations, this provides critical protection for brand integrity and client data.

Mar 6, 202660% relevant

How Structured Prompts Unlock AI Reasoning: The Car Wash Breakthrough

New research reveals that structured reasoning frameworks like STAR (Situation-Task-Action-Result) dramatically improve AI performance on complex reasoning tasks. The study shows prompt architecture matters more than context injection for solving implicit constraint problems.

Feb 26, 202670% relevant

MCP Confused Deputy: Protocol Design Lacks Provenance, Enables Injection

MCP has a confused deputy vulnerability: tool results lack provenance, allowing injection. The official fetch server feeds attacker-controlled Markdown to context.

Jul 14, 2026100% relevant

Opus 4.7 Prompt Surgery: 20K-Char Cut Per Coding Turn

Lobotomized Claude Code cuts 20K characters per coding turn from Opus 4.7's prompt, removing overfitted CAPS directives and anti-laziness scaffolding that harm the newer model.

May 13, 202678% relevant

From Vibe Code to Viable Product: The 6 Claude Code Prompts You're Missing

A developer's year-long journey reveals the critical prompts for edge cases, error states, and integrations that turn a 48-hour Claude Code MVP into a shippable product.

Apr 15, 2026100% relevant

Paper: LLMs Fail 'Safe' Tests When Prompted to Role-Play as Unethical Characters

A new paper reveals that large language models (LLMs) considered 'safe' on standard benchmarks will readily generate harmful content when prompted to role-play as unethical characters. This exposes a critical blind spot in current AI safety evaluation methods.

Apr 4, 202685% relevant

Claude Code's New Auto Mode: Run Commands Without Constant Permission Prompts

Claude Code's new Auto Mode uses a safety classifier to autonomously execute safe actions while blocking risky ones, eliminating constant permission prompts for routine tasks.

Mar 26, 202695% relevant

How to Write a CLAUDE.md for FastAPI That Stops AI-Generated Code Inconsistency

Write a CLAUDE.md for FastAPI with sections on Stack, Router Design, Dependency Injection, and Prohibitions. Keep it under 50 lines. Use positive patterns with code snippets. Place in repo root. Team-review changes as IaC.

Jun 29, 2026100% relevant

Embedding distance predicts VLM typographic attack success (r=-0.93)

A new study shows that embedding distance between image text and harmful prompt strongly predicts attack success rate (r=-0.71 to -0.93). The researchers introduce CWA-SSA optimization to recover readability and bypass safety alignment without model access.

Apr 29, 202682% relevant

Claude Code's Security Defaults: What It Ships When You Don't Ask

When building auth, uploads, and admin features, Claude Code defaults to importing bcrypt/JWT libraries while Codex uses standard library functions—neither adds rate limiting or security headers without explicit prompting.

Apr 15, 2026100% relevant

Claude Code's 'Safety Layer' Leak Reveals Why Your CLAUDE.md Isn't Enough

Claude Code's leaked safety system is just a prompt. For production agents, you need runtime enforcement, not just polite requests.

Apr 1, 202695% relevant

How to Auto-Approve Safe WebFetches While Blocking Suspicious URLs with Hooks

Use Claude Code's PreToolUse hooks to automatically allow clean documentation URLs while forcing manual review for any URL containing query parameters, eliminating repetitive prompts without sacrificing security.

Mar 31, 202681% relevant

How 'Steering Hooks' Can Fix Claude Code's Drifting Behavior

New research shows steering hooks achieve 100% accuracy vs 82% for prompts alone. Apply this to your CLAUDE.md to stop unpredictable outputs.

Mar 18, 202689% relevant

Anthropic's Auto Mode: Claude AI Solves Developer Permission Fatigue

Anthropic's Claude Code introduces Auto Mode, eliminating constant permission prompts during coding sessions. This research preview feature allows AI to handle security decisions autonomously while maintaining threat protection.

Mar 7, 202685% relevant

Why Claude Code's 'Tool Calls' Aren't Hooks — And How to Design for Its

Understanding Claude's 8-step tool pipeline—from edge routing to result injection—is critical for structuring error handling, timeouts, and debugging in production applications.

Apr 20, 2026100% relevant

Claude Code's New /review Command: How to Use It Without Breaking Your Budget or Team

Claude Code now has built-in code review. Learn the exact prompts and CLI flags to make it cost-effective and complementary to senior engineers.

Mar 11, 202696% relevant

Stop Relying on CLAUDE.md for Guarantees: Build Deterministic Hooks Instead

Claude Code hooks in settings.json let you run deterministic shell commands on SessionStart, PreToolUse, PostToolUse, and other events—replacing unreliable CLAUDE.md instructions for critical behaviors like blocking dangerous commands or injecting context.

Jul 12, 2026100% relevant

Aider vs Claude Code: When to Use Each for Terminal-First Development in 2026

Aider vs Claude Code: Aider wins on cost, undo, and local LLM support; Claude Code wins on agentic verification and enterprise security. Choose based on your team's needs.

Jul 10, 2026100% relevant

Claude Code Steganography Flagged Chinese Users; Anthropic Rolls Back

Anthropic's Claude Code 2.1.91 used steganography to detect Chinese users. After Reddit exposure, Anthropic rolled back the feature, calling it an experiment against model distillation.

Jul 1, 2026100% relevant

3 MCP Gateway Security Gaps LiteLLM's Audit Found (And How to Fix Them in

LiteLLM's audit revealed 3 MCP gateway gaps: fail-open resolver, unpinned servers, opt-in least-privilege. Fix them in Claude Code with version pinning and allowed_tools.

Jun 30, 202685% relevant

LLMs Default to Zod Schemas, Breaking MCPFusion Security Contracts

LLMs default to raw Zod schemas, bypassing MCPFusion's defineModel() and risking data leaks. The Developer Prover enforces MVA architecture via rejection.

Jun 28, 202685% relevant

Expose pgvector as an MCP Server: From Hardcoded RAG to Reusable Tool Server

Wrap pgvector search in FastMCP to create a reusable MCP server. Any LLM client—including Claude Code—can then query your vector database without hardcoded integrations.

Jun 27, 202698% relevant

Vibe Coding Fails: Why AI-Generated Code Breaks at Scale

Vibe coding fails because AI-generated code lacks architectural coherence, test coverage, and security validation, breaking at scale beyond 1,000 lines.

Jun 27, 202670% relevant

Gemini 3.5 Flash Scores 78.4 on OSWorld, Matching GPT-5.5

Google integrated Computer Use into Gemini 3.5 Flash, scoring 78.4 on OSWorld — matching GPT-5.5 and undercutting on cost.

Jun 25, 2026100% relevant

MACCHA: The File-Based Cross-Agent Brain That Makes Claude Code Remember

MACCHA solves Claude Code's cold-start problem with a file-based 7-tier memory system. Use it to persist preferences, project rules, and lessons across sessions without a daemon.

Jun 20, 202695% relevant

Explore More

AI Agents Large Language Models Claude Code OpenAI RAG MCP Fine-tuning Benchmarks Open Source AI AI Safety