Agent Workspace

Agent Workspace

Browse verified harness packs with source links, evaluation guidance, and starter instructions for Claude Code, Codex, Cursor, and Convex.

Packs · 23Traces · 2Publishers · 3
5
7

4 packs

4 shownVerified packs include starter instructions, sources, and evaluation guidance.
Sort: Featured first
Type: harnessCommunityAgent Workspace
Communityharness
Production-readyevaluator-optimizer

Claude Advisor Pattern v2

Sonnet executes, Opus advises. Route intelligence by confidence, pay Opus only when it matters.

Evaluator-optimizer harness built on the Anthropic Advisor Pattern. Claude Sonnet 4.6 runs the task; on low-confidence decisions (threshold = 0.7), after two consecutive tool failures, or when the executor explicitly escalates, it calls Claude Opus 4.6 as an advisor. The advisor returns a structured recommendation; Sonnet applies it and continues. Cost split is measured — typical deployments spend 8-12% of tokens on Opus while retaining ~93% of an Opus-only pass-rate.

Not yet measured

~0 installs·~0 tokens saved

harnessevaluator-optimizeradvisor
Open pack →
Communityharness
Recommendedorchestrator-workers

Turn Execution Pipeline

The 9-step loop, 5 context shapers, 3 recovery paths. The 98.4% infrastructure under every turn.

Canonical harness reference for Claude Code's turn-execution pipeline, derived from the VILA-Lab Dive-into-Claude-Code paper (arXiv 2604.14228). The paper's anchor finding is that Claude Code is ~1.6% AI decision logic and ~98.4% deterministic infrastructure; this pack documents the infrastructure. It names the 9 steps (settings resolution, state initialization, context assembly, five pre-model shapers, model call, tool dispatch, permission gate, tool execution, stop-condition check), the 5 pre-model context shapers in cheapest-first order (Budget Reduction, Snip, Microcompact, Context Collapse, Auto-Compact) with their triggers, and the 3 recovery mechanisms (max output token escalation up to 3 retries, reactive compaction firing at most once per turn, prompt-too-long overflow chain: context-collapse → reactive compaction → terminate). The pack maps directly onto the paper's three recurring design commitments: graduated layering over monolithic mechanisms, append-only designs favoring auditability over query power, and model judgment within a deterministic harness. Use it to implement, clone, or critique any production agent loop. Target: a staff engineer rebuilding the loop in a different language (Rust, Python, Bun) or auditing an existing one for compaction-race and stop-condition bugs.

Not yet measured

~0 installs·~0 tokens saved

harnessturn-loopcontext-compaction
Open pack →
Communityharness
Recommendedparallelization

Subagent Delegation — Three Isolation Modes

SkillTool vs AgentTool, 6 built-in types, 3 isolation modes. The knob every harness engineer gets wrong first.

Harness pack covering Claude Code's subagent architecture derived from the VILA-Lab architectural analysis (arXiv 2604.14228). Frames the SkillTool vs AgentTool trade-off as the central decision: SkillTool injects instructions into the current context (cheap, same window); AgentTool spawns a fresh isolated conversation (~7x tokens, context-safe). Documents the 6 built-in subagent types (Explore, Plan, General-purpose, Claude Code Guide, Verification, Statusline-setup), custom `.claude/agents/*.md` with YAML frontmatter (tools, model, permissionMode, hooks, skills, memory scope), the 3 isolation modes (worktree / remote / in-process — default), sidechain transcripts as separate JSONL files, and multi-instance coordination via POSIX flock() with zero external dependencies.

Not yet measured

~0 installs·~0 tokens saved

harnessparallelizationsubagents
Open pack →
Communityharness
Recommended

Session Persistence — Three Channels

Append-only JSONL across 3 channels. Permissions never restore on resume — the friction IS the safety.

Harness pack covering Claude Code's session persistence design derived from the VILA-Lab architectural analysis (arXiv 2604.14228). Documents the three persistence channels: append-only session JSONL transcripts (full conversation with chain-patched compaction boundaries), global `history.jsonl` for cross-session prompt recall (reverse-read for Up-arrow), and subagent sidechains as separate JSONL per subagent. Frames the critical deliberate non-feature: permissions are never restored on resume — trust is always re-established in the current session. The paper presents this as a design choice, not a UX bug: the user friction is the cost of maintaining the safety invariant. The pack also captures the append-only / chain-patching trade-off — auditability and simplicity over query power.

Not yet measured

~0 installs·~0 tokens saved

harnesspersistencesession
Open pack →

What changed, and why

Every coding session captured as rows — Scenario / Files / Changes / Why. Ctrl+F your own history.

View all traces