TAR Engine · Test · Audit · Report

Audit AI skill safety in 30 seconds

A public guide that audits AI skill safety — SKILL.md, Codex skill.yaml, Claude Code commands, and OpenCode configs — across static, semantic, and adversarial layers. Open source, BYOK for advanced layers.

Playground → Browse skills How it works

5119

Skills audited

10/10

Browse by reader personaSee all →

Ten reader personas, sixty subcategories. Pick the lens that matches the work you do.

Engineering & Code

Backend, frontend, mobile, ops, security, testing

Skills audited298

Data & AI

Data engineering, ML, MCP tools, prompts and RAG

Skills audited53

Product & Strategy

Product management, growth, strategy, analytics

Skills audited64

Design

UI/UX, visual + brand, design systems, prototyping

Skills audited46

Communication & Writing

Technical writing, marketing copy, PR, email outreach

Skills audited28

Sales & GTM

Outbound, closing, pricing, customer success

Skills audited19

Operations & People

Hiring, HR, finance, legal, admin

Skills audited16

Research & Science

User research, market research, academic / scientific

Skills audited9

Workflow & Automation

Productivity, integration, meta-skills, personal automation

Skills audited126

Career & Learning

Career advice, skill-building, coaching, learning resources

Skills audited27

Most recent auditsSee all →

Fresh off the press. Each entry links to a full report with finding-by-finding breakdown and the exact rule that triggered it.

sanka-plugin

github

till-claude-code-marketplace

github

k8s-security-policies

security github

tiktok-product-promotion

github

reporting

github

nodeflow-change-spec

github

groom-backlog

github

google-search-console

github

Editor's picks

Skills that earned an A or that taught us something about the threat landscape.

webshop-search-executor

github

zizmor

github

FoxSports_frontend

github

__SKILL_ID__

github

frappe-lms

github

receiving-code-review

github

screen-reader-testing

testing-qa github

How a skill becomes a published auditSee all →

Six passes, four live layers. Static rules cover the easy stuff. Semantic, adversarial, and supply-chain passes are where the interesting findings come from.

Layer 01

Static

Hard-coded regex and AST checks. Catches missing license, oversized files, secrets, malformed YAML, classic prompt-injection patterns.

Layer 02

Semantic

An LLM reads SKILL.md the way a careful reviewer would. Catches ambiguous instructions, capability overreach, missing guardrails.

Layer 03

Adversarial

Fifteen attacks across five classes are run through a victim model. Findings only surface when at least two of three attempts in a class succeed.

Coming soon

Layer 04

Behavioral Trace

Runs the skill once inside a sandbox with a mock LLM driver, records every file read or write, network fetch, and shell call, then audits the trace for claim-versus-behavior mismatches.

Coming soon

Layer 05

External Payload Tracing

Sandbox follows every URL and import the skill references, fetches the actual content, and recursively audits it. Catches benign-looking pointers to high-risk payloads.

Layer 06

Supply Chain

Parses every pip / npm dependency the skill declares, checks them against OSV.dev advisories, and flags typosquat candidates. Audit-only — no install. Surfaces SUP-001 typosquat / SUP-002 known CVE / SUP-003 unpinned dep findings.