Natural-language harness directory

Agent Workspace

Browse verified harness packs with source links, evaluation guidance, and starter instructions for Claude Code, Codex, Cursor, and Convex.

Submit a pack View repo

PacksPacks · 23TracesTraces · 2PublishersPublishers · 3

Verified5

Featured7

TagAll dive-into-claude-code · 9 harness · 6 reference · 6 claude-code · 4 architecture · 3 claude-code-internals · 3 security · 3 a11y · 2

3 packs

3 shownVerified packs include starter instructions, sources, and evaluation guidance.

Sort: Featured first

Tag: claude-code-internals

Communityharness

Recommendedorchestrator-workers

Turn Execution Pipeline

The 9-step loop, 5 context shapers, 3 recovery paths. The 98.4% infrastructure under every turn.

Canonical harness reference for Claude Code's turn-execution pipeline, derived from the VILA-Lab Dive-into-Claude-Code paper (arXiv 2604.14228). The paper's anchor finding is that Claude Code is ~1.6% AI decision logic and ~98.4% deterministic infrastructure; this pack documents the infrastructure. It names the 9 steps (settings resolution, state initialization, context assembly, five pre-model shapers, model call, tool dispatch, permission gate, tool execution, stop-condition check), the 5 pre-model context shapers in cheapest-first order (Budget Reduction, Snip, Microcompact, Context Collapse, Auto-Compact) with their triggers, and the 3 recovery mechanisms (max output token escalation up to 3 retries, reactive compaction firing at most once per turn, prompt-too-long overflow chain: context-collapse → reactive compaction → terminate). The pack maps directly onto the paper's three recurring design commitments: graduated layering over monolithic mechanisms, append-only designs favoring auditability over query power, and model judgment within a deterministic harness. Use it to implement, clone, or critique any production agent loop. Target: a staff engineer rebuilding the loop in a different language (Rust, Python, Bun) or auditing an existing one for compaction-race and stop-condition bugs.

Not yet measured

~0 installs·~0 tokens saved

harnessturn-loopcontext-compaction

Agent Workspace2026-04-19

Open pack →

Communitysecurity

Recommended

Seven Safety Layers

Defense-in-depth for tool execution. Deny > ask > allow. All 7 layers, in order, with honest failure modes.

Security reference for Claude Code's 7-layer permission architecture, derived from the VILA-Lab Dive-into-Claude-Code paper (arXiv 2604.14228). The paper's anchor finding is that Claude Code is ~1.6% AI decision logic and ~98.4% deterministic infrastructure; the safety layers are a large slice of that infrastructure. The pack enumerates the 7 independent layers in order — (1) tool pre-filtering, (2) deny-first rule evaluation, (3) permission mode constraints across 7 modes (plan, default, acceptEdits, auto, dontAsk, bypassPermissions, bubble), (4) auto-mode ML classifier (yoloClassifier.ts two-stage fast-filter + chain-of-thought with a timeout race against a pre-computed classification), (5) shell sandboxing for filesystem and network isolation, (6) non-restoration on resume (permissions re-established each session; trust is not persistent), (7) PreToolUse hooks with permissionDecision return values. It also names the 4-stage authorization pipeline (pre-filter → hooks → rules → handler with 4 branches). The pack carries the paper's three recurring design commitments — graduated layering, append-only auditability, model judgment inside a deterministic harness — and documents the honest failure modes the paper flags: ~50 subcommands bypass shell-layer analysis, the classifier's timeout race can leave decisions ambiguous, and the pre-trust window allowed 4 published CVEs where extensions executed before the trust dialog appeared. Target: a staff engineer or security lead reviewing an agent's tool-execution surface before shipping.

Not yet measured

~0 installs·~0 tokens saved

securitypermissionsdefense-in-depth

Agent Workspace2026-04-19

Open pack →

Communityreference

Recommended

Nine Context Sources

CLAUDE.md is user context, not system prompt. 9 sources, 4 hierarchy levels, zero embeddings.

Reference for the 9 ordered context sources Claude Code assembles before every model call, and the 4-level CLAUDE.md hierarchy that feeds one of them. Derived from the VILA-Lab Dive-into-Claude-Code paper (arXiv 2604.14228), whose anchor finding is that Claude Code is ~1.6% AI decision logic and ~98.4% deterministic infrastructure. The pack's entire frame is the paper's explicitly-labeled 'Critical design choice' from §Context Construction: CLAUDE.md is user context (probabilistic compliance by the model), NOT system prompt (deterministic enforcement by the runtime). Permission rules are the deterministic layer; CLAUDE.md instructs but does not enforce. The pack enumerates the 9 sources in order (system prompt → environment info → CLAUDE.md hierarchy → path-scoped rules → auto-memory → tool metadata → conversation history → tool results → compact summaries), breaks the CLAUDE.md hierarchy into its 4 precedence levels (/etc managed, ~/.claude user, project, .local gitignored), and documents the file-based memory model that replaces vector DB / embedding approaches with an LLM-based scan of up to 5 memory-file headers on demand. Fully inspectable, editable, and version-controllable by the user — and one of the paper's three recurring design commitments (append-only auditability over query power) made concrete. Target: an agent engineer who keeps hearing 'just put it in CLAUDE.md' as a fix and needs to understand why that is wrong for safety-critical rules.

Not yet measured

~0 installs·~0 tokens saved

referencecontextclaude-md

Agent Workspace2026-04-19

Open pack →

Recent change traces · Cross-linked to packs

What changed, and why

Every coding session captured as rows — Scenario / Files / Changes / Why. Ctrl+F your own history.

View all traces