Listen to today's AI briefing

Daily podcast — 5 min, AI-narrated summary of top stories

SSL: Structured Skill Language Boosts Skill Discovery MRR to 0.707

Researchers propose SSL, a three-layer typed JSON representation for AI agent skills, replacing unstructured SKILL.md prose. Using an LLM normalizer, SSL improves Skill Discovery MRR from 0.573 to 0.707 and Risk Assessment macro F1 from 0.744 to 0.787 on a newly released 6,184-skill corpus.

GAla Smith & AI Research Desk·6h ago·7 min read·10 views·AI-Generated·Report error

Source: x.comvia @omarsar0Single Source

Key Takeaways

Researchers propose SSL, a three-layer typed JSON representation for AI agent skills, replacing unstructured SKILL.md prose.
Using an LLM normalizer, SSL improves Skill Discovery MRR from 0.573 to 0.707 and Risk Assessment macro F1 from 0.744 to 0.787 on a newly released 6,184-skill corpus.

From Skill Text to Skill Structure

The Language Structure: A Student's Guide to Study in Real ...

As AI agent ecosystems scale, the humble SKILL.md file — a blob of natural language describing a capability — becomes a liability. Discovery tools can't parse it reliably, risk reviewers can't audit it systematically, and automation pipelines choke on its ambiguity.

A new paper from researchers introduces SSL (Structured Skill Language), a three-layer typed JSON representation that decomposes skill descriptions into scheduling, structural, and logical components. Paired with an LLM-based normalizer that converts existing SKILL.md files into this structured format, the approach delivers measurable gains: Skill Discovery Mean Reciprocal Rank (MRR) jumps from 0.573 to 0.707 (a 23.4% improvement), and Risk Assessment macro F1 rises from 0.744 to 0.787.

The Problem with SKILL.md

Current AI agent skill definitions, typically written as SKILL.md files, entangle three distinct concerns in one block of natural language:

Invocation interface — how to call the skill
Execution flow — what the skill does step by step
Tool/resource side effects — what external systems it touches

This conflation makes downstream tasks brittle. A discovery system searching for "email summarization" might miss a skill that describes itself as "digest inbox messages." A risk reviewer cannot programmatically check whether a skill accesses external APIs or modifies local files.

The SSL Architecture

SSL draws inspiration from Schank and Abelson's classical work on scripts, memory organization packets (MOPs), and conceptual dependency theory — a body of cognitive science research from the 1970s that formalized how humans structure routine actions.

The representation uses three layers:

Scheduling Invocation signals and preconditions Triggers, input parameters, required context Structural Execution scenes and ordering Sequential steps, parallel branches, loops Logical Atomic actions and resource use Tool calls, file reads, API writes, side effects

Each layer is a typed JSON object with defined schemas. The Scheduling layer captures what activates a skill; the Structural layer describes how execution flows through scenes; the Logical layer enumerates every atomic action and its resource dependencies.

From Text to Structure: LLM Normalizer

The researchers built an LLM-based normalizer that takes existing SKILL.md files and outputs SSL JSON. The normalizer uses a chain-of-thought prompting strategy:

Parse the natural language description into an intermediate script representation
Identify invocation signals and preconditions (Scheduling)
Decompose the execution into scenes (Structural)
Extract atomic actions and resource references (Logical)

On a held-out test set of 500 skills, the normalizer achieved 92.3% schema compliance — meaning the output JSON matched the SSL type definitions without structural errors.

Key Results

The paper evaluates SSL on two tasks using a newly released dataset of 6,184 skills, 403 task queries, and 500 risk-labeled skills:

Skill Discovery MRR 0.573 0.707 +23.4% Risk Assessment macro F1 0.744 0.787 +5.8%

For discovery, the structured representation allows exact matching on invocation signals and execution scenes, rather than fuzzy keyword search over prose. For risk assessment, the Logical layer's explicit enumeration of resource access enables automated pattern matching against risk taxonomies.

What This Means in Practice

Embeddings in Practice: A Research & Implementation Guide ...

For teams building agent platforms, SSL offers a drop-in replacement for SKILL.md that immediately improves search and audit capabilities. The LLM normalizer means existing skill libraries can be migrated without manual rewriting. The 6,184-skill corpus provides a benchmark for future work on skill representation and discovery.

How It Compares to Prior Work

The paper builds on a line of research in structured skill representations. Previous work like LangChain's tool definitions and OpenAI's function calling schema provide JSON schemas for individual function signatures, but they don't model execution flow or side effects. SSL's three-layer approach is closer to workflow languages like Temporal or AWS Step Functions, but specialized for the skill discovery and risk assessment use cases.

The key difference: SSL is designed for discovery and risk review of skills in a registry, not for executing them. It's a metadata layer, not an execution engine.

Limitations

The paper acknowledges several limitations:

The LLM normalizer was evaluated on a single model (GPT-4o); performance with other models is unknown
The risk assessment task uses a simplified taxonomy; real-world risk review may require more granular categories
The 6,184-skill corpus is synthetic; real-world skill registries may have different distributional properties
SSL does not yet support versioning or dependency management between skills

gentic.news Analysis

This paper addresses a practical bottleneck that has been quietly growing as agent ecosystems expand. The unstructured SKILL.md format works fine for a handful of skills, but breaks down at scale — exactly the problem the authors identify. The 23% improvement in discovery MRR is substantial enough to justify migration for any organization with more than a few hundred skills.

What's particularly interesting is the theoretical grounding. Drawing on Schank and Abelson's script theory from the 1970s connects modern LLM-based agents to a rich cognitive science tradition. The insight that skill execution follows predictable "scripts" — sequences of actions with clear preconditions and effects — is both old and newly relevant. The researchers effectively operationalize a 50-year-old theory using modern LLM capabilities.

The release of a 6,184-skill benchmark corpus is a significant contribution. Prior work in this area has been hampered by the lack of standardized evaluation datasets. Researchers can now compare discovery and risk assessment approaches on a common ground.

One open question: how well does the LLM normalizer handle real-world SKILL.md files, which often contain ambiguous language, implicit assumptions, and incomplete descriptions? The paper's 92.3% schema compliance on a held-out test set is promising, but the test set may not capture the full messiness of production skill registries.

Frequently Asked Questions

What is SSL in AI agent skills?

SSL (Structured Skill Language) is a three-layer typed JSON representation for AI agent skill descriptions. It decomposes the traditional unstructured SKILL.md file into Scheduling (invocation signals), Structural (execution scenes), and Logical (atomic actions and resource use) layers, enabling better skill discovery and risk assessment.

How does SSL improve skill discovery?

SSL improves Skill Discovery Mean Reciprocal Rank (MRR) from 0.573 to 0.707, a 23.4% improvement. The structured representation allows exact matching on invocation signals and execution scenes rather than fuzzy keyword search over natural language prose.

Can I convert my existing SKILL.md files to SSL?

Yes. The paper provides an LLM-based normalizer that converts existing SKILL.md files into SSL JSON format. On a test set of 500 skills, the normalizer achieved 92.3% schema compliance, meaning most conversions produce structurally valid output.

What dataset does the SSL paper release?

The researchers release a corpus of 6,184 skills, 403 task queries for discovery evaluation, and 500 risk-labeled skills for risk assessment evaluation. This provides a standardized benchmark for future work on skill representation and discovery.

Following this story?

Get a weekly digest with AI predictions, trends, and analysis — free.

AI Analysis

This paper addresses a real operational bottleneck in agent ecosystems: the impedance mismatch between human-readable skill descriptions and machine-parseable metadata. The 23% MRR improvement is not just a benchmark number — it translates directly to fewer missed skills in large registries, which is critical as organizations accumulate thousands of agent capabilities. The risk assessment improvement, while smaller, is arguably more important: structured side-effect enumeration enables automated security audits that are impossible with unstructured prose. The theoretical grounding in Schank and Abelson's script theory is a clever move. By framing skill execution as a sequence of scenes with preconditions and effects, the authors create a representation that is both human-interpretable (it mirrors how we think about routines) and machine-actionable (typed JSON with defined schemas). This is a rare case where cognitive science directly informs a practical engineering solution. For practitioners, the most actionable takeaway is the LLM normalizer. Organizations with existing SKILL.md libraries can migrate to SSL without manual rewriting, assuming they accept the ~8% schema failure rate. The paper doesn't address how to handle those failures — manual correction, retry with different prompting, or fallback to unstructured representation — but the 92% success rate is high enough to make migration feasible for most teams.

#structured representation #discovery #llm #agent skills #ai research

Mentioned in this article

SSL (Structured Skill Language)

Enjoyed this article?

Get the weekly AI intelligence briefing

AI Research

SSL: Structured Skill Language Boosts Skill Discovery MRR to 0.707

Key Takeaways

From Skill Text to Skill Structure

The Problem with SKILL.md

The SSL Architecture

From Text to Structure: LLM Normalizer

Key Results

What This Means in Practice

How It Compares to Prior Work

Limitations

gentic.news Analysis

Frequently Asked Questions

What is SSL in AI agent skills?

How does SSL improve skill discovery?

Can I convert my existing SKILL.md files to SSL?

What dataset does the SSL paper release?

AI Analysis

Related Articles

Turn Claude Code Into an AI SRE

Qwen3.6-27B: How to Run a 17GB Local Model That Beats 397B MoE on Coding Tasks

Stop Losing Agent Context: Implement Session Memory Files in Your Claude

CS3: A New Framework to Boost Two-Tower Recommenders Without Slowing Them Down

MCP's 'By Design' Security Flaw

Kimi 2.6 Thinking Shows Promise as Open Weights Model, Lags Behind Closed SoTA

More in AI Research

OpenAI Agents Now Ask Questions Good Enough for Research Papers

Alec Radford's 'Talk to the Past' AI Lets You Chat with History

NVIDIA Nemotron 3 Nano Omni: Open Multimodal Model Unifies Video, Audio, Image, Text