typographic injection

1 article about typographic injection in AI news

Embedding distance predicts VLM typographic attack success (r=-0.93)

A new study shows that embedding distance between image text and harmful prompt strongly predicts attack success rate (r=-0.71 to -0.93). The researchers introduce CWA-SSA optimization to recover readability and bypass safety alignment without model access.

Apr 29, 202682% relevant

Explore More

AI Agents Large Language Models Claude Code OpenAI RAG MCP Fine-tuning Benchmarks Open Source AI AI Safety