Methodology, How SkillCheck validates skills

Design philosophy

Validate against standards, not opinions

Every check traces to a published spec, a research paper, or documented practitioner experience. When we add one, we document what informed it and why it matters.

Progressive depth

Checks run in layers: structure first, then semantics, then content quality, security, agent readiness, and whether the knowledge is substantive or well-formatted filler.

Reward quality, don't just punish

A strong gotchas section, concrete code references, or clear error handling show up as positive signals in your report, not just the absence of penalties.

Free for structure, Pro for substance

Free tells you if the skill is built correctly. Pro tells you if it's built well. Every finding carries a severity: critical, warning, suggestion, or strength.

How the checks evolved

Each phase grounded the checks in new independent evidence.

SkillCheck started from one lab's guidelines. From there: practitioner field observations, cross-lab methodology, peer-reviewed academic research, and the OWASP agentic security catalogue. Each round made the checks harder to game.

Swipeable carousel

The seven phases at a glance One slide per phase, for a quick overview. Open →

One-page poster

All categories and sources The whole methodology on one scannable page. Open →

How a check runs

Pre-compiled patterns, applied line by line

Regex scans the skill content skipping code blocks and frontmatter. Compound patterns require multiple signals on the same line to fire. Every result carries a severity that feeds the scoring engine.

Match → strength

// Consequence pattern (Pro)

Input: "Never call HTTP inside

transactions; we had a 3-hr outage"

Match: imperative + consequence

→ strength · Knowledge density

No match → skipped

// Hollow content

Input: "Follow team standards."

Match: none, no compound signal

No finding emitted

How checks evaluate

Three methods, used across both tiers

Structural and pattern checks are reproducible: same input, same finding. Judgment checks have a wider tolerance band; the criteria are published so you can predict the outcome.

Structural

Present or absent

Required fields, file references, secrets, token counts. Exact, pass or fail, no ambiguity.

Structure · Body · Security · Token · Trigger Collision

Pattern

Named patterns

Anti-slop phrases, density signals, design patterns, governance checklists. Inspectable, read the rules and predict the outcome.

Naming · Anti-Slop · Enterprise · OWASP det. · Knowledge Density

Judgment

Reading comprehension

Contradictions, workflow clarity, subagent specificity, autonomy boundaries. Rubric-based against published criteria.

Semantics · Workflow · Autonomy Design · OWASP grader

Scoring model

Skills start at 100. Findings subtract. Strengths surface.

Every finding carries a severity. Strengths add no score but appear as positive signals in your report: proof that you built it well, not just not-wrongly.

Critical

−20

Structural violation. Must fix before shipping.

Warning

−5

Quality gap. Should fix.

Suggestion

−1

Minor improvement. Nice to fix.

Strength

Positive signal. No penalty, shown in report as proof of quality.

Two tiers, one standard

Free

Validates shape

Does the skill have the right structure, fields and sections? Free tells you what's missing. Open source, no install, no API key.

Start free on GitHub

Pro · $79 lifetime

Validates substance

Is the content inside those sections actually good? Pro tells you whether what's there is real, security, slop, readiness, governance.

Get Pro · $79 →

Independence

SkillCheck is an independent project, not affiliated with, endorsed by, or officially connected to Anthropic, OpenAI, Google, or any other AI lab. Research from those organizations informed specific check categories, as documented in the phases above. The implementation, scoring and quality judgments are SkillCheck's own.

Every check traces back to a source

Validate against standards, not opinions

Progressive depth

Reward quality, don't just punish

Free for structure, Pro for substance

Each phase grounded the checks in new independent evidence.

Pre-compiled patterns, applied line by line

Three methods, used across both tiers

Present or absent

Named patterns

Reading comprehension

Skills start at 100. Findings subtract. Strengths surface.

Validates shape

Validates substance

Independence

Free proves your skill is built correctly.
Pro proves it's built well.

Validate against standards, not opinions

Progressive depth

Reward quality, don't just punish

Free for structure, Pro for substance

Each phase grounded the checks in new independent evidence.

Pre-compiled patterns, applied line by line

Three methods, used across both tiers

Present or absent

Named patterns

Reading comprehension

Skills start at 100. Findings subtract. Strengths surface.

Validates shape

Validates substance

Independence

Free proves your skill is built correctly.Pro proves it's built well.

Free proves your skill is built correctly.
Pro proves it's built well.