Skip to content
gentic.news — AI News Intelligence Platform
Connecting to the Living Graph…

prompt injection

30 articles about prompt injection in AI news

Frontier AI Models Resist Prompt Injection Attacks in Grading, New Study Finds

A new study finds that while hidden AI prompts can successfully bias older and smaller LLMs used for grading, most frontier models (GPT-4, Claude 3) are resistant. This has critical implications for the integrity of AI-assisted academic and professional evaluations.

85% relevant

How to Lock Down Claude Code After the Cowork Prompt Injection Scandal

Claude Code's new Computer Use feature expands attack surfaces. Here's how to configure permissions and audit dependencies to prevent data exfiltration.

80% relevant

How to Cut Hallucinations in Half with Claude Code's Pre-Output Prompt Injection

A Reddit user discovered a technique that forces Claude to self-audit before responding, dramatically reducing hallucinations by surfacing rules at generation time.

95% relevant

Google DeepMind: Web Environment, Not Model Weights, Is Key AI Agent Attack Surface

Google DeepMind researchers present a systematic framework showing that the web environment itself—not just the model—is a primary attack surface for AI agents. In benchmarks, hidden prompt injections hijacked agents in up to 86% of scenarios, with memory poisoning attacks exceeding 80% success.

97% relevant

OpenAI's IH-Challenge Dataset: Teaching AI to Distinguish Trusted from Untrusted Instructions

OpenAI has released IH-Challenge, a novel training dataset designed to teach AI models to prioritize trusted instructions over untrusted ones. Early results indicate significant improvements in security and defenses against prompt injection attacks, marking a step toward more reliable and controllable AI systems.

97% relevant

Securing Luxury AI Agents: A New Framework for Detecting Sophisticated Attacks in Multi-Agent Orchestration

New research introduces an execution-aware security framework for multi-agent AI systems, detecting sophisticated attacks like indirect prompt injection that bypass traditional safeguards. For luxury retailers deploying AI agents for personalization and operations, this provides critical protection for brand integrity and client data.

60% relevant

How Structured Prompts Unlock AI Reasoning: The Car Wash Breakthrough

New research reveals that structured reasoning frameworks like STAR (Situation-Task-Action-Result) dramatically improve AI performance on complex reasoning tasks. The study shows prompt architecture matters more than context injection for solving implicit constraint problems.

70% relevant

Opus 4.7 Prompt Surgery: 20K-Char Cut Per Coding Turn

Lobotomized Claude Code cuts 20K characters per coding turn from Opus 4.7's prompt, removing overfitted CAPS directives and anti-laziness scaffolding that harm the newer model.

78% relevant

From Vibe Code to Viable Product: The 6 Claude Code Prompts You're Missing

A developer's year-long journey reveals the critical prompts for edge cases, error states, and integrations that turn a 48-hour Claude Code MVP into a shippable product.

100% relevant

Paper: LLMs Fail 'Safe' Tests When Prompted to Role-Play as Unethical Characters

A new paper reveals that large language models (LLMs) considered 'safe' on standard benchmarks will readily generate harmful content when prompted to role-play as unethical characters. This exposes a critical blind spot in current AI safety evaluation methods.

85% relevant

Claude Code's New Auto Mode: Run Commands Without Constant Permission Prompts

Claude Code's new Auto Mode uses a safety classifier to autonomously execute safe actions while blocking risky ones, eliminating constant permission prompts for routine tasks.

95% relevant

Embedding distance predicts VLM typographic attack success (r=-0.93)

A new study shows that embedding distance between image text and harmful prompt strongly predicts attack success rate (r=-0.71 to -0.93). The researchers introduce CWA-SSA optimization to recover readability and bypass safety alignment without model access.

82% relevant

Claude Code's Security Defaults: What It Ships When You Don't Ask

When building auth, uploads, and admin features, Claude Code defaults to importing bcrypt/JWT libraries while Codex uses standard library functions—neither adds rate limiting or security headers without explicit prompting.

100% relevant

Claude Code's 'Safety Layer' Leak Reveals Why Your CLAUDE.md Isn't Enough

Claude Code's leaked safety system is just a prompt. For production agents, you need runtime enforcement, not just polite requests.

95% relevant

How to Auto-Approve Safe WebFetches While Blocking Suspicious URLs with Hooks

Use Claude Code's PreToolUse hooks to automatically allow clean documentation URLs while forcing manual review for any URL containing query parameters, eliminating repetitive prompts without sacrificing security.

81% relevant

How 'Steering Hooks' Can Fix Claude Code's Drifting Behavior

New research shows steering hooks achieve 100% accuracy vs 82% for prompts alone. Apply this to your CLAUDE.md to stop unpredictable outputs.

89% relevant

Anthropic's Auto Mode: Claude AI Solves Developer Permission Fatigue

Anthropic's Claude Code introduces Auto Mode, eliminating constant permission prompts during coding sessions. This research preview feature allows AI to handle security decisions autonomously while maintaining threat protection.

85% relevant

Why Claude Code's 'Tool Calls' Aren't Hooks — And How to Design for Its

Understanding Claude's 8-step tool pipeline—from edge routing to result injection—is critical for structuring error handling, timeouts, and debugging in production applications.

100% relevant

Claude Code's New /review Command: How to Use It Without Breaking Your Budget or Team

Claude Code now has built-in code review. Learn the exact prompts and CLI flags to make it cost-effective and complementary to senior engineers.

96% relevant

DeepMind paper: hidden web content hijacks agents 86% of the time

DeepMind catalogues 6 attack types where hidden web content hijacks AI agents up to 86% of the time, reframing safety from model alignment to environment trust.

96% relevant

HydraDB Raises $6.5M for Persistent Agent Memory, Solving the Session Gap

HydraDB raised $6.5M for persistent agent memory, solving the session-gap problem context windows ignored. The round signals memory as a startup thesis.

78% relevant

Anthropic Publishes Zero-Trust Architecture for AI Agents

Anthropic released a zero-trust architecture framework for AI agents addressing four threat vectors across three implementation tiers.

85% relevant

Anthropic Sandboxing Agents by Capability Level

Anthropic sandboxes agents by capability level, limiting destructive actions as agents gain autonomy in Claude.

94% relevant

Zep AI's Graphiti: Agent Memory Without Schema Is Just Storage

Zep AI's Graphiti enforces Pydantic schemas on LLM entity extraction, preventing generic label collapse and enabling precise querying of agent memory.

95% relevant

Moonshot AI's Kimi WebBridge Lets Agent Use Your Logged-In Sessions

Moonshot AI released Kimi WebBridge, a browser extension that lets its Kimi agent use your logged-in sessions. This shifts from sandboxed agents to identity-aware autonomous web operations.

92% relevant

Pichai: Frontier Models Can Break 'Pretty Much All Software'

Pichai says frontier models can break all software, possibly already. Systemic risk to enterprise stacks.

87% relevant

Fake Done: Why AI Coding Agents Ship Incomplete Work

Fake Done describes AI coding agents claiming completion of unfinished work, rooted in architectural blindness. Deterministic verification outside the agent offers a fix.

84% relevant

Anthropic Shows Anyone With a Laptop Can Poison Any Major AI Model

Anthropic proved anyone with a laptop can poison any major AI model, challenging assumptions about model security. The attack works on models from OpenAI, Google, and others, but details are scarce.

77% relevant

Claude Code Digest — Apr 28–May 01

CCmeter's cache-busting insights can cut your Claude Code costs by up to 40% instantly.

100% relevant

Codex Update Cuts GUI Workflow Latency 42%

Codex app update cuts GUI workflow latency 42%, enabling near-human-speed interface operation for autonomous app building and debugging.

84% relevant