Tessera Launches Open-Source Framework for 32 OWASP AI Security Tests, Benchmarks GPT-4o, Claude, Gemini, Llama 3

Tessera Launches Open-Source Framework for 32 OWASP AI Security Tests, Benchmarks GPT-4o, Claude, Gemini, Llama 3

Tessera introduces the first open-source framework to run all 32 OWASP AI security tests against any model with one CLI command. It provides benchmark results for GPT-4o, Claude, Gemini, Llama 3, and Mistral across 21 model-specific security tests.

Ggentic.news Editorial·2h ago·6 min read·9 views
Share:
Source: github.comvia hacker_news_mlCorroborated

Tessera Launches Open-Source Framework for 32 OWASP AI Security Tests, Benchmarks GPT-4o, Claude, Gemini, Llama 3

Tessera, a new open-source security framework, has launched with comprehensive testing capabilities for large language models. The tool enables developers and security teams to run all 32 OWASP AI security tests against any AI model—including OpenAI's GPT-4o, Anthropic's Claude, Google's Gemini, Meta's Llama 3, and Mistral—with a single command-line interface command.

What Tessera Does

Tessera positions itself as the first open-source framework to implement the complete OWASP AI Security and Privacy Guide, which outlines 32 distinct security tests across multiple categories. The framework follows a three-phase methodology: Attack (executing security tests), Measure (quantifying vulnerabilities), and Defend (providing remediation guidance).

Unlike fragmented security tools that address individual vulnerabilities, Tessera provides a unified testing platform covering:

  • Application Security (APP-01 through APP-14): Prompt injection, denial of service, excessive agency
  • Model Security (MOD-01 through MOD-07): Model theft, adversarial examples, data poisoning
  • Infrastructure Security (INF-01 through INF-05): Supply chain vulnerabilities, insecure deployment
  • Data Governance (DAT-01 through DAT-06): Data leakage, privacy violations

The framework includes 375 total tests: 32 OWASP security test implementations, 261 unit/integration tests, and 82 end-to-end tests. According to the project's test suite, all 375 tests pass in approximately 42 seconds.

How to Use Tessera

Installation and usage follow straightforward patterns:

# Install the framework
pip install tessera-ai[all]

# Run against GPT-4o
OPENAI_API_KEY=sk-... tessera --config examples/llm-openai.yaml --per-model --format json html

# Run against Claude
ANTHROPIC_API_KEY=sk-ant-... tessera --config examples/llm-anthropic.yaml --per-model --format json html

# Run against Gemini
GOOGLE_APPLICATION_CREDENTIALS=/path/to/creds.json tessera --config examples/llm-vertex.yaml --per-model --format json html

# Run against Llama 3 via Ollama
ollama run llama3:70b
tessera --config examples/llm-ollama.yaml --per-model --format json html

The framework supports any model with an OpenAI-compatible API, including self-hosted and fine-tuned models. Output formats include JSON and HTML reports for integration into CI/CD pipelines and compliance documentation.

Benchmark Results for Major Models

The Tessera team has published benchmark results for the top five AI models across all applicable test categories. According to their methodology, each model was tested with default Tessera thresholds, with LLM-specific tests (APP-01 through APP-14, MOD-07) run against each model.

Infrastructure (INF) and Data Governance (DAT) tests apply to deployment configuration rather than models directly, so the published results cover the 21 model-specific security tests. The framework generates benchmark tables programmatically:

python scripts/generate_benchmark.py --output-format markdown

While the source material doesn't include specific numerical scores for each model, the framework enables organizations to generate their own comparative security assessments across different providers.

Regulatory Context and Market Need

The launch comes as regulatory frameworks like the EU AI Act and NIST AI RMF require organizations to demonstrate security testing of their AI systems. Current tools are fragmented—with separate solutions for prompt injection, adversarial robustness, and data governance—creating compliance challenges for enterprises deploying AI at scale.

Tessera addresses this gap by providing a comprehensive testing framework that can be integrated into development workflows, security audits, and compliance reporting. The open-source nature allows security researchers to contribute new test vectors and adapt the framework to emerging threats.

gentic.news Analysis

Tessera's launch arrives at a critical inflection point in AI deployment. As Anthropic projects surpassing OpenAI in annual recurring revenue by mid-2026 and OpenAI shifts its product organization to "AGI Deployment" while teasing its upcoming "Spud" model, the competitive landscape is intensifying rapidly. Security testing frameworks like Tessera become essential infrastructure as these companies push models into more sensitive and regulated domains.

The framework's timing aligns with several trends we've been tracking. First, the increased regulatory scrutiny following the EU AI Act creates immediate demand for standardized testing tools. Second, as Anthropic launches its "Computer Use" beta for Claude Desktop and OpenAI winds down Sora to reallocate compute to next-generation models, both companies are pushing AI into more interactive, agentic roles—precisely where security vulnerabilities become most dangerous.

Notably, Tessera's support for all major providers reflects the fragmented but interconnected nature of today's AI ecosystem. With OpenAI competing with Anthropic, Anthropic competing with Google, and all major players developing increasingly capable models, enterprises need security tools that work across this multi-vendor landscape. The framework's OpenAI-compatible API support is particularly strategic, as this has become a de facto standard even among competitors.

The benchmark capabilities could influence procurement decisions as enterprises compare not just capability but security posture across providers. As we reported recently with Origin CLI tracking AI agent contributions, the industry is developing the tooling necessary for responsible AI deployment at scale. Tessera represents the security dimension of this maturation process.

Frequently Asked Questions

What are the OWASP AI security tests?

The OWASP AI Security and Privacy Guide defines 32 security tests across four categories: Application Security (14 tests covering prompt injection, excessive agency, etc.), Model Security (7 tests covering model theft, adversarial examples, etc.), Infrastructure Security (5 tests covering supply chain vulnerabilities), and Data Governance (6 tests covering data leakage and privacy). These represent the most comprehensive security framework specifically designed for AI systems.

How does Tessera compare to existing AI security tools?

Most existing AI security tools address specific vulnerabilities—like prompt injection detection or adversarial example generation—but lack comprehensive coverage. Tessera appears to be the first open-source framework implementing all 32 OWASP tests in a unified tool. Its single-command testing across multiple providers and output formats for compliance reporting differentiates it from point solutions.

Can Tessera test fine-tuned or proprietary models?

Yes, Tessera supports any model with an OpenAI-compatible API, which includes most commercial providers and can be implemented for custom models. The framework's configuration system allows testing against self-hosted models, fine-tuned variants, and proprietary systems, provided they expose a compatible inference endpoint.

How does Tessera handle false positives and test thresholds?

The framework includes configurable thresholds for each test category, allowing organizations to adjust sensitivity based on their risk tolerance and use case. The default thresholds used in the published benchmarks provide a baseline, but enterprises can customize these parameters to reduce false positives in production environments.

AI Analysis

Tessera represents a significant step toward standardized security evaluation in an increasingly fragmented AI landscape. The framework's comprehensive approach—covering all 32 OWASP tests—addresses a critical gap as AI deployment moves from experimentation to production, particularly in regulated industries. What's most notable is the timing: as regulatory pressure increases with the EU AI Act and enterprises face actual compliance deadlines, tools like Tessera transition from 'nice-to-have' to essential infrastructure. The benchmark capabilities could create interesting competitive dynamics among model providers. While capability benchmarks (like MMLU or coding performance) dominate current comparisons, security benchmarks might become equally important for enterprise procurement, especially in finance, healthcare, and government sectors. We've seen similar patterns in traditional software, where security certifications often influence purchasing decisions as much as performance metrics. Technically, the framework's architecture—supporting any OpenAI-compatible API—is strategically sound. This design choice future-proofs the tool against API changes and ensures compatibility with emerging providers. As the AI ecosystem continues to fragment with new entrants, security tools that work across providers will become increasingly valuable. The 375-test suite (32 OWASP + 343 supporting tests) suggests substantial engineering investment, though the real test will be adoption and maintenance as attack vectors evolve. Looking at our knowledge graph context, Tessera's launch intersects with several trends we've been tracking. Anthropic's projected revenue growth and increased enterprise focus, OpenAI's organizational shifts toward AGI deployment, and the general maturation of AI tooling all create demand for robust security frameworks. As AI systems become more agentic (like Anthropic's Computer Use beta) and autonomous, the security surface area expands dramatically—making comprehensive testing not just regulatory compliance but operational necessity.
Enjoyed this article?
Share:

Related Articles

More in Products & Launches

View all