Agent Workspace

Agent Workspace

Browse verified harness packs with source links, evaluation guidance, and starter instructions for Claude Code, Codex, Cursor, and Convex.

Packs · 23Traces · 2Publishers · 3
5
7

10 packs

10 shownVerified packs include starter instructions, sources, and evaluation guidance.
Sort: Featured first
Type: harness
Verifiedharness
Production-readyhybrid

Operator Chat Rail

Shared chat plus traceable assistant rail for high-context workflows.

A proven interface pattern for apps that need collaborative chat in the center and a persistent agent rail for plan trace, tool execution, telemetry, sources, and quality checks.

Not yet measured

~0 installs

chattracesources
Open pack →
Verifiedharness
Recommendedhybrid

Planning and Worker Flow

Tiered orchestration for questions that are too broad for a single call.

A multi-step harness pattern that uses a narrow planning pass, typed worker calls, and a constrained synthesis pass instead of letting one prompt attempt everything at once.

Not yet measured

~0 installs

planningworkersparallelism
Open pack →
Verifiedharness
Production-readyhybrid

Answer Review and Quality Checks

Persist the answer as a reviewable artifact, not just a message string.

A runtime pattern that persists final answers, quality checks, evaluation metadata, and downstream review state as first-class records instead of burying them in message text.

Not yet measured

~0 installs

qualitypacketseval
Open pack →
Communityharness
Production-readyevaluator-optimizer

Claude Advisor Pattern v2

Sonnet executes, Opus advises. Route intelligence by confidence, pay Opus only when it matters.

Evaluator-optimizer harness built on the Anthropic Advisor Pattern. Claude Sonnet 4.6 runs the task; on low-confidence decisions (threshold = 0.7), after two consecutive tool failures, or when the executor explicitly escalates, it calls Claude Opus 4.6 as an advisor. The advisor returns a structured recommendation; Sonnet applies it and continues. Cost split is measured — typical deployments spend 8-12% of tokens on Opus while retaining ~93% of an Opus-only pass-rate.

Not yet measured

~0 installs·~0 tokens saved

harnessevaluator-optimizeradvisor
Open pack →
Verifiedharness
Recommendedhybrid

Hybrid Runtime

Use rules for known packets, use synthesis only when ambiguity is real.

A disciplined runtime pattern that reserves deterministic rendering for tightly scoped known cases and uses an LLM only for bounded synthesis, not as the default execution path for every question.

Not yet measured

~0 installs

deterministichybridrouting
Open pack →
Verifiedharness
Production-readyhybrid

Durable Streaming

Stream plan and execution state through stored events, not transient sockets alone.

A transport and persistence pattern for agents that need progressive rendering, reconnect safety, and post-run replay without losing the benefits of token or step streaming.

Not yet measured

~0 installs

streamingeventsdurability
Open pack →
Communityharness
Recommendedorchestrator-workers

Turn Execution Pipeline

The 9-step loop, 5 context shapers, 3 recovery paths. The 98.4% infrastructure under every turn.

Canonical harness reference for Claude Code's turn-execution pipeline, derived from the VILA-Lab Dive-into-Claude-Code paper (arXiv 2604.14228). The paper's anchor finding is that Claude Code is ~1.6% AI decision logic and ~98.4% deterministic infrastructure; this pack documents the infrastructure. It names the 9 steps (settings resolution, state initialization, context assembly, five pre-model shapers, model call, tool dispatch, permission gate, tool execution, stop-condition check), the 5 pre-model context shapers in cheapest-first order (Budget Reduction, Snip, Microcompact, Context Collapse, Auto-Compact) with their triggers, and the 3 recovery mechanisms (max output token escalation up to 3 retries, reactive compaction firing at most once per turn, prompt-too-long overflow chain: context-collapse → reactive compaction → terminate). The pack maps directly onto the paper's three recurring design commitments: graduated layering over monolithic mechanisms, append-only designs favoring auditability over query power, and model judgment within a deterministic harness. Use it to implement, clone, or critique any production agent loop. Target: a staff engineer rebuilding the loop in a different language (Rust, Python, Bun) or auditing an existing one for compaction-race and stop-condition bugs.

Not yet measured

~0 installs·~0 tokens saved

harnessturn-loopcontext-compaction
Open pack →
Communityharness
Recommendedparallelization

Subagent Delegation — Three Isolation Modes

SkillTool vs AgentTool, 6 built-in types, 3 isolation modes. The knob every harness engineer gets wrong first.

Harness pack covering Claude Code's subagent architecture derived from the VILA-Lab architectural analysis (arXiv 2604.14228). Frames the SkillTool vs AgentTool trade-off as the central decision: SkillTool injects instructions into the current context (cheap, same window); AgentTool spawns a fresh isolated conversation (~7x tokens, context-safe). Documents the 6 built-in subagent types (Explore, Plan, General-purpose, Claude Code Guide, Verification, Statusline-setup), custom `.claude/agents/*.md` with YAML frontmatter (tools, model, permissionMode, hooks, skills, memory scope), the 3 isolation modes (worktree / remote / in-process — default), sidechain transcripts as separate JSONL files, and multi-instance coordination via POSIX flock() with zero external dependencies.

Not yet measured

~0 installs·~0 tokens saved

harnessparallelizationsubagents
Open pack →
Communityharness
Recommended

Session Persistence — Three Channels

Append-only JSONL across 3 channels. Permissions never restore on resume — the friction IS the safety.

Harness pack covering Claude Code's session persistence design derived from the VILA-Lab architectural analysis (arXiv 2604.14228). Documents the three persistence channels: append-only session JSONL transcripts (full conversation with chain-patched compaction boundaries), global `history.jsonl` for cross-session prompt recall (reverse-read for Up-arrow), and subagent sidechains as separate JSONL per subagent. Frames the critical deliberate non-feature: permissions are never restored on resume — trust is always re-established in the current session. The paper presents this as a design choice, not a UX bug: the user friction is the cost of maintaining the safety invariant. The pack also captures the append-only / chain-patching trade-off — auditability and simplicity over query power.

Not yet measured

~0 installs·~0 tokens saved

harnesspersistencesession
Open pack →
Communityharness
Experimentalhybrid

Workflow Elicitation

Capture operator judgment before you automate it.

A future-facing pattern for teams that need to externalize tacit expertise into reusable operating instructions before they try to scale an agent across a domain.

Not yet measured

~0 installs

elicitationoperating-systemknowledge-capture
Open pack →

What changed, and why

Every coding session captured as rows — Scenario / Files / Changes / Why. Ctrl+F your own history.

View all traces