Agent Workspace

Agent Workspace

Browse verified harness packs with source links, evaluation guidance, and starter instructions for Claude Code, Codex, Cursor, and Convex.

Packs · 23Traces · 2Publishers · 3
5
7

9 packs

9 shownVerified packs include starter instructions, sources, and evaluation guidance.
Sort: Featured first
Tag: dive-into-claude-code
Communityreference
Recommended

Four Design Questions

The 2-minute orientation: every coding agent answers the same 4 questions.

A short reference pack that frames every coding-agent architecture as an answer to four recurring questions: where reasoning lives, how many execution engines, the default safety posture, and the binding resource constraint. Sourced from VILA-Lab's Dive into Claude Code (arXiv 2604.14228, CC-BY-NC-SA-4.0). Read this first — every other dive-sourced pack in the catalog is a zoom-in on one row of this table.

Not yet measured

~0 installs·~0 tokens saved

referencearchitectureentry-level
Open pack →
Communityharness
Recommendedorchestrator-workers

Turn Execution Pipeline

The 9-step loop, 5 context shapers, 3 recovery paths. The 98.4% infrastructure under every turn.

Canonical harness reference for Claude Code's turn-execution pipeline, derived from the VILA-Lab Dive-into-Claude-Code paper (arXiv 2604.14228). The paper's anchor finding is that Claude Code is ~1.6% AI decision logic and ~98.4% deterministic infrastructure; this pack documents the infrastructure. It names the 9 steps (settings resolution, state initialization, context assembly, five pre-model shapers, model call, tool dispatch, permission gate, tool execution, stop-condition check), the 5 pre-model context shapers in cheapest-first order (Budget Reduction, Snip, Microcompact, Context Collapse, Auto-Compact) with their triggers, and the 3 recovery mechanisms (max output token escalation up to 3 retries, reactive compaction firing at most once per turn, prompt-too-long overflow chain: context-collapse → reactive compaction → terminate). The pack maps directly onto the paper's three recurring design commitments: graduated layering over monolithic mechanisms, append-only designs favoring auditability over query power, and model judgment within a deterministic harness. Use it to implement, clone, or critique any production agent loop. Target: a staff engineer rebuilding the loop in a different language (Rust, Python, Bun) or auditing an existing one for compaction-race and stop-condition bugs.

Not yet measured

~0 installs·~0 tokens saved

harnessturn-loopcontext-compaction
Open pack →
Communitysecurity
Recommended

Seven Safety Layers

Defense-in-depth for tool execution. Deny > ask > allow. All 7 layers, in order, with honest failure modes.

Security reference for Claude Code's 7-layer permission architecture, derived from the VILA-Lab Dive-into-Claude-Code paper (arXiv 2604.14228). The paper's anchor finding is that Claude Code is ~1.6% AI decision logic and ~98.4% deterministic infrastructure; the safety layers are a large slice of that infrastructure. The pack enumerates the 7 independent layers in order — (1) tool pre-filtering, (2) deny-first rule evaluation, (3) permission mode constraints across 7 modes (plan, default, acceptEdits, auto, dontAsk, bypassPermissions, bubble), (4) auto-mode ML classifier (yoloClassifier.ts two-stage fast-filter + chain-of-thought with a timeout race against a pre-computed classification), (5) shell sandboxing for filesystem and network isolation, (6) non-restoration on resume (permissions re-established each session; trust is not persistent), (7) PreToolUse hooks with permissionDecision return values. It also names the 4-stage authorization pipeline (pre-filter → hooks → rules → handler with 4 branches). The pack carries the paper's three recurring design commitments — graduated layering, append-only auditability, model judgment inside a deterministic harness — and documents the honest failure modes the paper flags: ~50 subcommands bypass shell-layer analysis, the classifier's timeout race can leave decisions ambiguous, and the pre-trust window allowed 4 published CVEs where extensions executed before the trust dialog appeared. Target: a staff engineer or security lead reviewing an agent's tool-execution surface before shipping.

Not yet measured

~0 installs·~0 tokens saved

securitypermissionsdefense-in-depth
Open pack →
Communityreference
Recommended

Nine Context Sources

CLAUDE.md is user context, not system prompt. 9 sources, 4 hierarchy levels, zero embeddings.

Reference for the 9 ordered context sources Claude Code assembles before every model call, and the 4-level CLAUDE.md hierarchy that feeds one of them. Derived from the VILA-Lab Dive-into-Claude-Code paper (arXiv 2604.14228), whose anchor finding is that Claude Code is ~1.6% AI decision logic and ~98.4% deterministic infrastructure. The pack's entire frame is the paper's explicitly-labeled 'Critical design choice' from §Context Construction: CLAUDE.md is user context (probabilistic compliance by the model), NOT system prompt (deterministic enforcement by the runtime). Permission rules are the deterministic layer; CLAUDE.md instructs but does not enforce. The pack enumerates the 9 sources in order (system prompt → environment info → CLAUDE.md hierarchy → path-scoped rules → auto-memory → tool metadata → conversation history → tool results → compact summaries), breaks the CLAUDE.md hierarchy into its 4 precedence levels (/etc managed, ~/.claude user, project, .local gitignored), and documents the file-based memory model that replaces vector DB / embedding approaches with an LLM-based scan of up to 5 memory-file headers on demand. Fully inspectable, editable, and version-controllable by the user — and one of the paper's three recurring design commitments (append-only auditability over query power) made concrete. Target: an agent engineer who keeps hearing 'just put it in CLAUDE.md' as a fix and needs to understand why that is wrong for safety-critical rules.

Not yet measured

~0 installs·~0 tokens saved

referencecontextclaude-md
Open pack →
Communityharness
Recommendedparallelization

Subagent Delegation — Three Isolation Modes

SkillTool vs AgentTool, 6 built-in types, 3 isolation modes. The knob every harness engineer gets wrong first.

Harness pack covering Claude Code's subagent architecture derived from the VILA-Lab architectural analysis (arXiv 2604.14228). Frames the SkillTool vs AgentTool trade-off as the central decision: SkillTool injects instructions into the current context (cheap, same window); AgentTool spawns a fresh isolated conversation (~7x tokens, context-safe). Documents the 6 built-in subagent types (Explore, Plan, General-purpose, Claude Code Guide, Verification, Statusline-setup), custom `.claude/agents/*.md` with YAML frontmatter (tools, model, permissionMode, hooks, skills, memory scope), the 3 isolation modes (worktree / remote / in-process — default), sidechain transcripts as separate JSONL files, and multi-instance coordination via POSIX flock() with zero external dependencies.

Not yet measured

~0 installs·~0 tokens saved

harnessparallelizationsubagents
Open pack →
Communityreference
Recommended

Extensibility — Four Mechanisms

Hooks → Skills → Plugins → MCP. Graduated context cost, three injection points, one decision tree.

Reference pack covering Claude Code's four extension mechanisms derived from the VILA-Lab architectural analysis (arXiv 2604.14228). Frames the graduated context-cost hierarchy — Hooks (zero context, 27 events, 4 exec types) → Skills (low, SKILL.md with 15+ YAML fields, injected via SkillTool) → Plugins (medium, 10 component types) → MCP Servers (high, 7 transport types). Documents the 5-step tool pool assembly (base enumeration up to 54 tools → mode filtering → deny pre-filter → MCP integration → dedup) and the three injection points in the loop: assemble() (what the model sees), model() (what it can reach), execute() (whether/how it runs). The pack is a decision tree: which mechanism should you reach for, and why reaching for MCP first is the most common mistake.

Not yet measured

~0 installs·~0 tokens saved

referenceextensibilityhooks
Open pack →
Communityharness
Recommended

Session Persistence — Three Channels

Append-only JSONL across 3 channels. Permissions never restore on resume — the friction IS the safety.

Harness pack covering Claude Code's session persistence design derived from the VILA-Lab architectural analysis (arXiv 2604.14228). Documents the three persistence channels: append-only session JSONL transcripts (full conversation with chain-patched compaction boundaries), global `history.jsonl` for cross-session prompt recall (reverse-read for Up-arrow), and subagent sidechains as separate JSONL per subagent. Frames the critical deliberate non-feature: permissions are never restored on resume — trust is always re-established in the current session. The paper presents this as a design choice, not a UX bug: the user friction is the cost of maintaining the safety invariant. The pack also captures the append-only / chain-patching trade-off — auditability and simplicity over query power.

Not yet measured

~0 installs·~0 tokens saved

harnesspersistencesession
Open pack →
Communityreference
Recommended

Agent Design Space — Six Decisions

The architect's framework: answer these six before picking a pattern.

A reference pack that walks builders through the six recurring design decisions every production coding agent must answer: reasoning placement, safety posture, context management, extensibility, subagent architecture, and session persistence. Each decision pairs Claude Code's answer with realistic alternatives from LangGraph, Devin, SWE-Agent, OpenHands, and Aider, then closes with a Meta-Pattern of three commitments (graduated layering, append-only, model-judgment + deterministic harness). Frame this as 'the pack about picking packs' — answer all six first, then let the catalog serve specific patterns that match your posture. Sourced from VILA-Lab/Dive-into-Claude-Code build-your-own-agent.md (arXiv 2604.14228, CC-BY-NC-SA-4.0).

Not yet measured

~0 installs·~0 tokens saved

referencearchitecturedesign-space
Open pack →
Communitysecurity
Recommended

CVE: The Pre-Trust Execution Window

4 CVEs share one root cause: extensions execute before the trust dialog renders.

A security pack targeting the pre-trust execution window documented in VILA-Lab's Dive into Claude Code (arXiv 2604.14228). Extensions — plugins, MCP servers, hooks, skill manifests — can execute code during CLI initialization *before* the trust dialog appears, creating a structurally privileged attack window that lives outside the deny-first permission pipeline. The paper's README and Safety section tie four patched CVEs to this root cause. This is NOT prompt injection at runtime — it is code execution at load time, with the user's full OS privileges, before the user has been shown what they are trusting. Pair with `injection-surface-audit` for runtime content attacks; use this pack when you are building or auditing a harness that loads third-party extensions. CC-BY-NC-SA-4.0 source terms apply.

Not yet measured

~0 installs·~0 tokens saved

securitycvepre-trust-window
Open pack →

What changed, and why

Every coding session captured as rows — Scenario / Files / Changes / Why. Ctrl+F your own history.

View all traces