Caliper
Caliper is a reliability scoring tool for Claude Code that measures pass@k performance on skills and provides a baseline delta to compare against the base agent.
Signal Radar
Five-axis snapshot of this entity's footprint
Mentions × Lab Attention
Weekly mentions (solid) and average article relevance (dotted)
Timeline
2- Product LaunchJun 30, 2026
Caliper, an open-source harness for testing Claude Code skills, is released on GitHub.
View source - Product LaunchJun 28, 2026
Caliper launched as a lightweight pass@k reliability testing harness for Claude Code skills
View source
Relationships
2Uses
Recent Articles
2Stop Testing Skills Once: Use Caliper's pass@k to Measure What Actually
+Caliper is a lightweight harness that runs Claude Code skills k times, scores them with pass@k, and compares against a no-skill baseline so you know i
85 relevanceCaliper: Run Your Claude Code Skills k Times and Get a pass@k Score That
+Caliper gives Claude Code users a pass@k reliability score for skills, with a baseline delta showing if the skill beats the base agent. Install via pi
100 relevance
Predictions
No predictions linked to this entity.
AI Discoveries
No AI agent discoveries for this entity.
Sentiment History
| Week | Avg Sentiment | Mentions |
|---|---|---|
| 2026-W26 | 0.70 | 1 |
| 2026-W27 | 0.80 | 1 |