Agent Workspace
← Back to catalog
Verifiedharnesshybridv0.1.0Production-ready

Answer Review and Quality Checks

Persist the answer as a reviewable artifact, not just a message string.

Agent WorkspaceSigned (unverified)Verified publisher·Updated 2026-04-15·~0 installs this month

Install

npx attrition-sh pack install answer-review-and-quality-checks
# Answer Review and Quality Checks
# See: /packs/answer-review-and-quality-checks

Raw Markdown

Machine-readable body for agent ingestion or copy/paste.

Download as .md

Telemetry

Not yet measured

Summary

A runtime pattern that persists final answers, quality checks, evaluation metadata, and downstream review state as first-class records instead of burying them in message text.

Fit and expected payoff

When this pack earns its extra structure, when to skip it, and what it should improve.

Situations where this pack earns its extra structure.

  • Answers may need post-run review, escalation, or auditing.
  • You want live evaluation to exercise the same runtime as the product.
  • The app needs a deploy gate or quality dashboard.

Keeps the pack from becoming a default hammer.

  • The output is disposable and does not need later inspection.
  • You are still in the earliest sketch phase and do not yet know the domain rubric.

Expected outcomes if implemented well.

  • Quality becomes measurable and replayable.
  • The answer contract is separated from the chat transcript.
  • Runtime checks and eval checks can share a common packet shape.

Minimal instructions

Smallest useful starting point.

Persist a final answer packet for every meaningful run.

The packet should include:
- final answer
- scope and references
- quality checks
- trace pointers
- evaluation linkage

Do not rely on chat text alone as the system of record.

Full instructions

Complete natural-language instruction set.

Treat the answer as a durable application artifact.

For every non-trivial run:
1. Persist the final answer to an answer packet.
2. Attach references, quality checks, and trace metadata.
3. Link eval runs and live scoring back to the same packet shape.
4. Surface packet status in the UI so operators can see whether the answer passed or failed checks.

This should support:
- replay
- review
- later analytics
- quality gating before deployment

Evaluation checklist

These checks should pass before you consider the pattern production-ready.

  • Is there a persisted answer packet for each completed assistant run?
  • Do runtime checks and eval checks share a visible schema?
  • Can a later reviewer inspect packet quality without reading the raw message transcript?

Common failure modes

Every check below traces back to a specific production failure. Read as: "I would think about X because in production Y can happen."

  • Mid

    Quality checks exist only in logs, not in app data.

    Trigger
    (legacy — trigger not separated)
    Prevention
    (legacy — no explicit prevention)
  • Mid

    The final answer is stored as unstructured chat text only.

    Trigger
    (legacy — trigger not separated)
    Prevention
    (legacy — no explicit prevention)
  • Mid

    Evaluation runs test a different runtime than the product uses.

    Trigger
    (legacy — trigger not separated)
    Prevention
    (legacy — no explicit prevention)

Official docs and implementation references

Reference implementations