Autonomous coding safety rail
Reviewer packets before agent confidence theater.
A deterministic pipeline that turns a git diff into scoped specialist reviews, evidence bundles, and a final hard-gate judge. No LLM calls inside the factory. No repo mutation. No commit until approval evidence exists.
spec_factory_tests=pass cases=6 context_factory_tests=pass cases=6 review_factory_tests=pass cases=16 review_factory_judge_tests=pass cases=14 code_factory_tests=pass cases=3
fullsample risk tier
8review packets
14judge cases
Pipeline
01
Spec Factory
Turns the task into requirements, acceptance criteria, readiness, and approval gates.
02
Context Factory
Collects bounded repo context and project docs with hashes and source paths.
03
Review Factory
Routes changed files to specialist reviewer packets based on deterministic risk rules.
04
Reviewer Lane
Hermes delegates packets to specialists: tests, architecture, security, docs, ops, cost/context.
05
Final Judge
Aggregates findings. BLOCK beats REQUEST_CHANGES. Approval needs real local evidence.
Risk routing demo
The sample diff intentionally touches auth code, a dependency manifest, and a secret-like config path.
.envrequirements.txtsrc/auth.py
Generated reviewer panel
spec_compliancetests_regressionarchitecturesecuritydependency_supply_chaincost_contextdocs_dxfinal_judge
Safety fixes from review
- Refuses review output inside the reviewed repo unless explicitly overridden.
- Rejects git --base-ref option injection before running git diff.
- Adds bounded untracked-file numstat evidence.
- Routes .env/private-key/cert paths to security/full-tier review.
- Final judge now lets closed approval gates BLOCK even when other evidence is also missing.
- Redaction no longer mangles generated command flags when a task merely mentions secret-like paths.
Risk tiers
| Tier | Triggered by | Reviewers |
|---|---|---|
lightweight | Docs-only changes with no security/dependency/infra signal. | docs_dx → final_judge |
standard | Normal code changes. | spec_compliance, tests_regression, architecture, docs_dx, final_judge |
full | Auth/secrets/dependencies/infra/large diffs. | Full specialist panel including security, supply-chain, ops, cost/context. |
Run it
REVIEW_OUT="/Users/slothuus/.hermes/rapid42/reviews/$(basename "$REPO")/$(date +%Y%m%d-%H%M%S)" CONTEXT_OUT="/Users/slothuus/.hermes/rapid42/contexts/$(basename "$REPO")/$(date +%Y%m%d-%H%M%S)" python3 /Users/slothuus/.hermes/rapid42/scripts/rapid42_review_factory.py --repo "$REPO" --task "$TASK" --spec-contract /path/to/spec_contract.json --output-dir "$REVIEW_OUT" --context-output-dir "$CONTEXT_OUT" python3 /Users/slothuus/.hermes/rapid42/scripts/rapid42_review_judge.py --findings-dir /path/to/reviewer-findings --output-dir /path/to/fresh-review-decision --verification-evidence "<real local test/check output>" --acceptance-evidence-file /path/to/acceptance_evidence.json --approval-evidence "<required for closed gates>" --spec-contract /path/to/spec_contract.json --review-plan "$REVIEW_OUT/review_plan.json"
Sample artifacts
Loading…