skills
30 articles about skills in AI news
Stop Testing Skills Once: Use Caliper's pass@k to Measure What Actually
Caliper is a lightweight harness that runs Claude Code skills k times, scores them with pass@k, and compares against a no-skill baseline so you know if your skill actually helps.
Caliper: Run Your Claude Code Skills k Times and Get a pass@k Score That
Caliper gives Claude Code users a pass@k reliability score for skills, with a baseline delta showing if the skill beats the base agent. Install via pipx or npx.
Claude Fable 5 Migration: Cut Prescriptive Skills 60% to Stop Degrading Output
Audit your ~/.claude/skills for temperature, budget_tokens, and 'show your reasoning'. Replace 6+ step procedures with goal+constraints. Cut MUST/NEVER blocks to only guard money, deletions, or identity.
Larger models learn rare skills by forgetting them less, new paper shows
New paper from Stanford, MIT, Harvard, and Anthropic shows larger models learn rare skills because they forget them less during training, tested on OLMo models from 4M to 4B parameters.
Nvidia Unveils Physical AI Agent Skills, 32B VLA Model at CVPR
Nvidia launched physical AI agent skills and a 32B VLA model at CVPR to automate AV and robotics workflows, addressing the fragmented tooling bottleneck.
Microsoft SkillOpt Trains Agent Skills in Text Space, Beats 52/52 Benchmarks
Microsoft's SkillOpt trains agent skills in text space, achieving best or tied-best results in all 52 settings across 6 benchmarks and 7 models.
CLAUDE.md Wastes 7K+ Tokens Per Turn; Skills Cut to 50
A 1,000-line CLAUDE.md burns 7,000-10,000 tokens per turn on instructions the model already knows. Skills using progressive disclosure cut that to ~50 tokens.
Skills as Untrusted Code: A Security Precedent for Agent Runtimes
Paper argues agent skills are untrusted code until verified; runtimes must enforce verification gates to prevent supply-chain attacks, echoing decades of software security lessons.
Ctx2Skill: Self-Play Framework Lets LMs Discover Skills Without Labels
Ctx2Skill discovers skills from context via multi-agent self-play without labels. Outputs plug into any LM, targeting manual prompt engineering bottlenecks.
Build Reusable Data Science Workflows with Claude Skills and Subagents
Claude Skills and Subagents let you package prompts into reusable modules, freeing data scientists from repetitive AI adjustments for EDA, modeling, and deployment.
10 Claude Code Skills That Actually Work: A Solo Developer's Vetted List
A curated list of the most effective Claude Code skills for developers, based on hands-on testing, focusing on practical MCP servers and workflow enhancements.
Ethan Mollick: AI Judgment & Problem-Solving Are Skills, Not Human Exclusives
Ethan Mollick contends that skills like judgment and problem-solving, often cited as uniquely human, are domains where AI can and does demonstrate competence, reframing them as learnable capabilities.
Stop Thinking 'Progressive Disclosure' for Claude Skills — Think
A mental model shift from 'progressive disclosure' to 'progressive discovery' makes building Claude Skills more intuitive by clarifying Claude's active role in finding what it needs.
Free 'finance-skills' Tool Adds Bloomberg Terminal-Like Features to Claude
An open-source tool called 'finance-skills' allows Claude to access real-time financial data and analysis, replicating key features of the expensive Bloomberg Terminal platform for free.
MiniMax Open-Sources Three Agent Music Skills for MMX-CLI
MiniMax has open-sourced three 'Music Skills' for its MMX-CLI agent platform. The skills allow AI agents to generate music, sing in a persona, and curate playlists from a user's local library.
Newline's 'Skills' Update Shows Where MCP Servers Are Headed
The Newline MCP server now supports modular 'Skills,' allowing developers to customize their Claude Code environment with specific, installable capabilities for more targeted workflows.
Palantir CEO Karp: AI Will 'Destroy Humanities Jobs', Shift to Vocational Skills
Palantir CEO Alex Karp warns AI will 'destroy humanities jobs,' arguing broad degrees lose value while vocational skills and neurodivergent traits become key advantages. He insists there will still be 'more than enough jobs,' just redistributed toward practical roles.
Addy Osmani Unveils 'Agent Skills' for AI-Powered Development
Google VP Addy Osmani teased a new framework called 'Agent Skills' for constructing AI agents, likely a significant move to standardize and simplify agent-based development workflows.
MCP Security Crisis: 43% of Servers Vulnerable, 341 Malicious Skills Found
Security audits of the Model Context Protocol (MCP) ecosystem reveal 43% of servers are vulnerable to command execution, while 341 malicious skills were found on marketplaces, exposing systemic security flaws in agentic AI. The findings highlight a growing attack surface as AI agents become more autonomous.
How Anthropic's Team Uses Skills as Knowledge Containers (And What It Means For Your CLAUDE.md)
Learn how to use Claude Code skills not just for automation but as living knowledge bases, following patterns from Anthropic's own engineering team.
Anthropic's Claude Skills Implements 3-Layer Context Architecture to Manage Hundreds of Skills
Anthropic's Claude Skills framework employs a three-layer context management system that loads only skill metadata by default, enabling support for hundreds of specialized skills without exceeding context window limits.
How to Build a Custom AI Agent with Claude Code's Skills, SubAgents, and Hooks
A developer's deep dive into customizing Claude Code with 7 skills, 5 subagents, and quality-check hooks—showing how to move beyond basic prompting to create a truly autonomous coding assistant.
Base44 Launches Superagent Skills: No-Code Library for Adding Domain-Specific Functions to AI Agents
Base44 has launched Superagent Skills, a library of pre-built, domain-specific functions that can be added to AI agents with a single click. The no-code system allows for combining skills and creating custom ones via natural language description.
Trace2Skill Framework Distills Execution Traces into Declarative Skills via Parallel Sub-Agents
Researchers introduced Trace2Skill, a framework that uses parallel sub-agents to analyze execution trajectories and distill them into transferable declarative skills. This enables performance improvements in larger models without parameter updates.
Claude Skills: How Anthropic's Context-Aware Workflow System Solves the bloated CLAUDE.md Problem
Claude Skills are modular, self-contained workflow packages that load only when triggered by user intent, solving the context bloat caused by monolithic CLAUDE.md files. They support automatic invocation, slash commands, and can bundle supporting documents.
Palantir CEO Alex Karp: AI Era Will Favor Trade Skills and Neurodivergent Thinking
Palantir CEO Alex Karp predicts AI will most reward individuals with hands-on vocational skills and those who think in unusually original, often neurodivergent, ways. This perspective challenges the narrative that AI success is reserved for traditional tech roles.
How Weaviate Agent Skills Let Claude Code Build Vector Apps in Minutes
Weaviate's official Agent Skills give Claude Code structured access to vector databases, eliminating guesswork when building semantic search and RAG applications.
Awesome Finance Skills: Open-Source Plugin Adds Real-Time Market Analysis to AI Agents
Developer open-sources Awesome Finance Skills, a plug-and-play toolkit that gives AI agents real-time financial data access, sentiment analysis, and automated research report generation. The MIT-licensed package works with Claude Code, OpenClaw, and other popular agent frameworks.
How to Deploy Claude Code at Scale: The Admin's Guide to MCPs, Skills, and User Management
Practical solutions for managing Claude Code across teams: central MCP servers, standardized CLAUDE.md templates, and pre-configured skills to prevent chaos.
How to Install claude-flow MCP and 3 Skills That Transform Claude Code
A production team's setup reveals claude-flow MCP with hierarchical-mesh topology and three essential skills that add structure, parallelism, and quality control.