Key Takeaways
- Researchers propose SSL, a three-layer typed JSON representation for AI agent skills, replacing unstructured SKILL.md prose.
- Using an LLM normalizer, SSL improves Skill Discovery MRR from 0.573 to 0.707 and Risk Assessment macro F1 from 0.744 to 0.787 on a newly released 6,184-skill corpus.
From Skill Text to Skill Structure

As AI agent ecosystems scale, the humble SKILL.md file — a blob of natural language describing a capability — becomes a liability. Discovery tools can't parse it reliably, risk reviewers can't audit it systematically, and automation pipelines choke on its ambiguity.
A new paper from researchers introduces SSL (Structured Skill Language), a three-layer typed JSON representation that decomposes skill descriptions into scheduling, structural, and logical components. Paired with an LLM-based normalizer that converts existing SKILL.md files into this structured format, the approach delivers measurable gains: Skill Discovery Mean Reciprocal Rank (MRR) jumps from 0.573 to 0.707 (a 23.4% improvement), and Risk Assessment macro F1 rises from 0.744 to 0.787.
The Problem with SKILL.md
Current AI agent skill definitions, typically written as SKILL.md files, entangle three distinct concerns in one block of natural language:
- Invocation interface — how to call the skill
- Execution flow — what the skill does step by step
- Tool/resource side effects — what external systems it touches
This conflation makes downstream tasks brittle. A discovery system searching for "email summarization" might miss a skill that describes itself as "digest inbox messages." A risk reviewer cannot programmatically check whether a skill accesses external APIs or modifies local files.
The SSL Architecture
SSL draws inspiration from Schank and Abelson's classical work on scripts, memory organization packets (MOPs), and conceptual dependency theory — a body of cognitive science research from the 1970s that formalized how humans structure routine actions.
The representation uses three layers:
Scheduling Invocation signals and preconditions Triggers, input parameters, required context Structural Execution scenes and ordering Sequential steps, parallel branches, loops Logical Atomic actions and resource use Tool calls, file reads, API writes, side effectsEach layer is a typed JSON object with defined schemas. The Scheduling layer captures what activates a skill; the Structural layer describes how execution flows through scenes; the Logical layer enumerates every atomic action and its resource dependencies.
From Text to Structure: LLM Normalizer
The researchers built an LLM-based normalizer that takes existing SKILL.md files and outputs SSL JSON. The normalizer uses a chain-of-thought prompting strategy:
- Parse the natural language description into an intermediate script representation
- Identify invocation signals and preconditions (Scheduling)
- Decompose the execution into scenes (Structural)
- Extract atomic actions and resource references (Logical)
On a held-out test set of 500 skills, the normalizer achieved 92.3% schema compliance — meaning the output JSON matched the SSL type definitions without structural errors.
Key Results
The paper evaluates SSL on two tasks using a newly released dataset of 6,184 skills, 403 task queries, and 500 risk-labeled skills:
Skill Discovery MRR 0.573 0.707 +23.4% Risk Assessment macro F1 0.744 0.787 +5.8%For discovery, the structured representation allows exact matching on invocation signals and execution scenes, rather than fuzzy keyword search over prose. For risk assessment, the Logical layer's explicit enumeration of resource access enables automated pattern matching against risk taxonomies.
What This Means in Practice

For teams building agent platforms, SSL offers a drop-in replacement for SKILL.md that immediately improves search and audit capabilities. The LLM normalizer means existing skill libraries can be migrated without manual rewriting. The 6,184-skill corpus provides a benchmark for future work on skill representation and discovery.
How It Compares to Prior Work
The paper builds on a line of research in structured skill representations. Previous work like LangChain's tool definitions and OpenAI's function calling schema provide JSON schemas for individual function signatures, but they don't model execution flow or side effects. SSL's three-layer approach is closer to workflow languages like Temporal or AWS Step Functions, but specialized for the skill discovery and risk assessment use cases.
The key difference: SSL is designed for discovery and risk review of skills in a registry, not for executing them. It's a metadata layer, not an execution engine.
Limitations
The paper acknowledges several limitations:
- The LLM normalizer was evaluated on a single model (GPT-4o); performance with other models is unknown
- The risk assessment task uses a simplified taxonomy; real-world risk review may require more granular categories
- The 6,184-skill corpus is synthetic; real-world skill registries may have different distributional properties
- SSL does not yet support versioning or dependency management between skills
gentic.news Analysis
This paper addresses a practical bottleneck that has been quietly growing as agent ecosystems expand. The unstructured SKILL.md format works fine for a handful of skills, but breaks down at scale — exactly the problem the authors identify. The 23% improvement in discovery MRR is substantial enough to justify migration for any organization with more than a few hundred skills.
What's particularly interesting is the theoretical grounding. Drawing on Schank and Abelson's script theory from the 1970s connects modern LLM-based agents to a rich cognitive science tradition. The insight that skill execution follows predictable "scripts" — sequences of actions with clear preconditions and effects — is both old and newly relevant. The researchers effectively operationalize a 50-year-old theory using modern LLM capabilities.
The release of a 6,184-skill benchmark corpus is a significant contribution. Prior work in this area has been hampered by the lack of standardized evaluation datasets. Researchers can now compare discovery and risk assessment approaches on a common ground.
One open question: how well does the LLM normalizer handle real-world SKILL.md files, which often contain ambiguous language, implicit assumptions, and incomplete descriptions? The paper's 92.3% schema compliance on a held-out test set is promising, but the test set may not capture the full messiness of production skill registries.
Frequently Asked Questions
What is SSL in AI agent skills?
SSL (Structured Skill Language) is a three-layer typed JSON representation for AI agent skill descriptions. It decomposes the traditional unstructured SKILL.md file into Scheduling (invocation signals), Structural (execution scenes), and Logical (atomic actions and resource use) layers, enabling better skill discovery and risk assessment.
How does SSL improve skill discovery?
SSL improves Skill Discovery Mean Reciprocal Rank (MRR) from 0.573 to 0.707, a 23.4% improvement. The structured representation allows exact matching on invocation signals and execution scenes rather than fuzzy keyword search over natural language prose.
Can I convert my existing SKILL.md files to SSL?
Yes. The paper provides an LLM-based normalizer that converts existing SKILL.md files into SSL JSON format. On a test set of 500 skills, the normalizer achieved 92.3% schema compliance, meaning most conversions produce structurally valid output.
What dataset does the SSL paper release?
The researchers release a corpus of 6,184 skills, 403 task queries for discovery evaluation, and 500 risk-labeled skills for risk assessment evaluation. This provides a standardized benchmark for future work on skill representation and discovery.









