Natural-language harness directory
Agent Workspace
Browse verified harness packs with source links, evaluation guidance, and starter instructions for Claude Code, Codex, Cursor, and Convex.
Directory
10 packs
Operator Chat Rail
Shared chat plus traceable assistant rail for high-context workflows.
A proven interface pattern for apps that need collaborative chat in the center and a persistent agent rail for plan trace, tool execution, telemetry, sources, and quality checks.
Not yet measured
~0 installs
Planning and Worker Flow
Tiered orchestration for questions that are too broad for a single call.
A multi-step harness pattern that uses a narrow planning pass, typed worker calls, and a constrained synthesis pass instead of letting one prompt attempt everything at once.
Not yet measured
~0 installs
Answer Review and Quality Checks
Persist the answer as a reviewable artifact, not just a message string.
A runtime pattern that persists final answers, quality checks, evaluation metadata, and downstream review state as first-class records instead of burying them in message text.
Not yet measured
~0 installs
Claude Advisor Pattern v2
Sonnet executes, Opus advises. Route intelligence by confidence, pay Opus only when it matters.
Evaluator-optimizer harness built on the Anthropic Advisor Pattern. Claude Sonnet 4.6 runs the task; on low-confidence decisions (threshold = 0.7), after two consecutive tool failures, or when the executor explicitly escalates, it calls Claude Opus 4.6 as an advisor. The advisor returns a structured recommendation; Sonnet applies it and continues. Cost split is measured — typical deployments spend 8-12% of tokens on Opus while retaining ~93% of an Opus-only pass-rate.
Not yet measured
~0 installs·~0 tokens saved
Hybrid Runtime
Use rules for known packets, use synthesis only when ambiguity is real.
A disciplined runtime pattern that reserves deterministic rendering for tightly scoped known cases and uses an LLM only for bounded synthesis, not as the default execution path for every question.
Not yet measured
~0 installs
Durable Streaming
Stream plan and execution state through stored events, not transient sockets alone.
A transport and persistence pattern for agents that need progressive rendering, reconnect safety, and post-run replay without losing the benefits of token or step streaming.
Not yet measured
~0 installs
Turn Execution Pipeline
The 9-step loop, 5 context shapers, 3 recovery paths. The 98.4% infrastructure under every turn.
Canonical harness reference for Claude Code's turn-execution pipeline, derived from the VILA-Lab Dive-into-Claude-Code paper (arXiv 2604.14228). The paper's anchor finding is that Claude Code is ~1.6% AI decision logic and ~98.4% deterministic infrastructure; this pack documents the infrastructure. It names the 9 steps (settings resolution, state initialization, context assembly, five pre-model shapers, model call, tool dispatch, permission gate, tool execution, stop-condition check), the 5 pre-model context shapers in cheapest-first order (Budget Reduction, Snip, Microcompact, Context Collapse, Auto-Compact) with their triggers, and the 3 recovery mechanisms (max output token escalation up to 3 retries, reactive compaction firing at most once per turn, prompt-too-long overflow chain: context-collapse → reactive compaction → terminate). The pack maps directly onto the paper's three recurring design commitments: graduated layering over monolithic mechanisms, append-only designs favoring auditability over query power, and model judgment within a deterministic harness. Use it to implement, clone, or critique any production agent loop. Target: a staff engineer rebuilding the loop in a different language (Rust, Python, Bun) or auditing an existing one for compaction-race and stop-condition bugs.
Not yet measured
~0 installs·~0 tokens saved
Subagent Delegation — Three Isolation Modes
SkillTool vs AgentTool, 6 built-in types, 3 isolation modes. The knob every harness engineer gets wrong first.
Harness pack covering Claude Code's subagent architecture derived from the VILA-Lab architectural analysis (arXiv 2604.14228). Frames the SkillTool vs AgentTool trade-off as the central decision: SkillTool injects instructions into the current context (cheap, same window); AgentTool spawns a fresh isolated conversation (~7x tokens, context-safe). Documents the 6 built-in subagent types (Explore, Plan, General-purpose, Claude Code Guide, Verification, Statusline-setup), custom `.claude/agents/*.md` with YAML frontmatter (tools, model, permissionMode, hooks, skills, memory scope), the 3 isolation modes (worktree / remote / in-process — default), sidechain transcripts as separate JSONL files, and multi-instance coordination via POSIX flock() with zero external dependencies.
Not yet measured
~0 installs·~0 tokens saved
Session Persistence — Three Channels
Append-only JSONL across 3 channels. Permissions never restore on resume — the friction IS the safety.
Harness pack covering Claude Code's session persistence design derived from the VILA-Lab architectural analysis (arXiv 2604.14228). Documents the three persistence channels: append-only session JSONL transcripts (full conversation with chain-patched compaction boundaries), global `history.jsonl` for cross-session prompt recall (reverse-read for Up-arrow), and subagent sidechains as separate JSONL per subagent. Frames the critical deliberate non-feature: permissions are never restored on resume — trust is always re-established in the current session. The paper presents this as a design choice, not a UX bug: the user friction is the cost of maintaining the safety invariant. The pack also captures the append-only / chain-patching trade-off — auditability and simplicity over query power.
Not yet measured
~0 installs·~0 tokens saved
Workflow Elicitation
Capture operator judgment before you automate it.
A future-facing pattern for teams that need to externalize tacit expertise into reusable operating instructions before they try to scale an agent across a domain.
Not yet measured
~0 installs
Recent change traces · Cross-linked to packs
What changed, and why
Every coding session captured as rows — Scenario / Files / Changes / Why. Ctrl+F your own history.