Forge is an autonomous development orchestrator that practices the craft: tests first, security always, every commit meaningful. It doesn't just write code — it builds software the way a disciplined team would.
Most AI tools generate code. Forge builds software. There's a difference.
Software craftsmanship means discipline. It means writing the test before the implementation. It means scanning for secrets and vulnerabilities on every change. It means each commit tells a story. It means stopping when something is wrong rather than plowing forward.
Forge encodes these principles into an autonomous loop. You describe what to build. Forge builds it the way a senior engineer would — methodically, safely, and with a clean git history you'd be proud to ship.
Every feature starts with a failing test. No exceptions. Red-Green-Refactor isn't a suggestion — it's enforced.
Secret detection, SAST scanning, and dependency audits run on every iteration. Critical findings block the loop.
Conventional commits per TDD phase: test:, feat:, refactor:. Your git log reads like documentation.
The circuit breaker detects stagnation — repeated failures, no progress, regressions — and halts before wasting time and tokens.
Every practice that makes software reliable — automated, enforced, and built into the loop.
Red-Green-Refactor cycle tracked and enforced. Tests are written before code, with regression detection on refactor.
6 specialized agents — Architect, Implementer, Tester, Reviewer, Security, Documenter — automatically matched to tasks.
Secret detection, SAST vulnerability scanning, and dependency audits run on every iteration. Blocks on critical findings.
5-gate pipeline: tests, coverage, security, lint, and commit validation. When gates fail, Claude automatically fixes the issues and re-runs.
Every TDD phase produces a commit: test: for red, feat: for green, refactor: for cleanup. Clean git history.
Nygard pattern detects stagnation — repeated failures, no progress, test regressions — and stops the loop before wasting tokens.
Real-time Ink TUI with cost tracking, TDD phase pipeline, progress bar, Claude stream output, and toggleable detail overlay. Press d for deep metrics, q to quit safely.
Real-time API cost monitoring — total spend, per-task cost, per-phase breakdown, and average cost per call. Stay in control of your token budget.
Tasks are sorted by priority and dependency depth. Foundational tasks with the most dependents are built first — not random order.
Hits a rate limit? Forge shows a countdown modal, waits for the exact reset time, and resumes automatically. No task failures, no wasted retries.
Resume interrupted runs. Task completion persists to disk. Context-exhausted sessions rotate automatically. Pick up where you left off.
Task stuck? Forge pauses and asks you: retry with guidance, defer to later, skip, or abort. Your hint is injected into the next attempt.
Skip a task for now, work on others, come back later. Deferred tasks deprioritize behind pending work and retry with a fresh count.
Use GitHub's spec-kit for planning (specify, plan, tasks), Forge for execution. Best of both worlds.
No shortcuts. The same disciplined process on every task, whether it's the first or the fiftieth.
Use GitHub's spec-kit for the specification and planning phases. Forge reads the output and executes autonomously with TDD, security, and quality gates.
From first install to full autonomous loops — step by step.
Get from zero to running in under a minute. You need Node.js ≥ 20 and Claude Code CLI installed.
$ npm install -g @redgreen-labs/forge-cli # Run inside your existing project directory $ cd my-project $ forge init # Import your requirements document $ forge import requirements.md # Start the autonomous development loop $ forge run --iterations 20 # Check progress anytime $ forge status
That's it. Forge reads your requirements, builds a task dependency graph, and starts executing — test-first, with security scanning and quality gates on every iteration.
| Command | Description |
|---|---|
| forge init | Initialize project, auto-detect workspaces and language |
| forge import <file> | Import a PRD (Markdown or JSON), scan and auto-decompose |
| forge run | Start the autonomous development loop |
| forge status | Show session progress and quality metrics |
| forge report | Generate a project health report (terminal, JSON, or HTML) |
| forge decompose | Decompose large tasks into smaller TDD-friendly subtasks |
| forge agents | List available agent roles and their tools |
forge init| Option | Description |
|---|---|
| -n, --name <name> | Project name |
| -i, --interactive | Guided PRD creation with questions |
| -f, --force | Overwrite existing .forge directory |
| --no-scan | Skip workspace auto-detection |
| -v, --verbose | Show detailed scan output |
forge import| Option | Description |
|---|---|
| -v, --verbose | Show detailed scan output |
| --no-scan | Skip codebase scan for existing implementations |
| --no-decompose | Skip automatic decomposition of large tasks |
forge run| Option | Description |
|---|---|
| -n, --iterations <n> | Maximum iterations (default: 50) |
| --resume | Resume from previous run, skipping completed tasks |
| --no-tui | Disable live TUI (plain text output) |
| -v, --verbose | Show detailed executor output |
| --solo | Single agent mode (no team rotation) |
| --dry-run | Simulate execution without running Claude |
forge status| Option | Description |
|---|---|
| --json | Output as JSON |
| -w, --watch | Refresh status every few seconds |
| --interval <seconds> | Watch interval in seconds (default: 3) |
forge report| Option | Description |
|---|---|
| -f, --format <type> | Output format: terminal, html, or json (default: terminal) |
forge decompose| Option | Description |
|---|---|
| --threshold <n> | Complexity threshold 1-10 (tasks above this are decomposed) |
| --max-subtasks <n> | Max subtasks per parent task |
| --dry-run | Show which tasks would be decomposed without calling Claude |
| -v, --verbose | Show detailed output |
Forge supports three task formats, auto-detected in priority order:
Use GitHub's spec-kit for planning, then let Forge execute:
# Generate specs with spec-kit $ npx spec-kit specify $ npx spec-kit plan $ npx spec-kit tasks # Forge auto-detects specs/tasks.md $ forge run
Forge reads from the specs/ directory:
| File | Purpose |
|---|---|
| specs/tasks.md | Task list with T-IDs, phases, dependencies |
| specs/constitution.md | Project principles — injected into agent prompts |
| specs/spec.md | Detailed requirements — injected into agent prompts |
| specs/plan.md | Architecture decisions — injected into agent prompts |
Spec-kit task format supports markers: [P] = parallelizable, [US1] = user story ref, (depends on T001) = dependency.
$ forge import requirements.md # Parses to .forge/prd.json $ forge run
# Place a tasks.md in .forge/ $ forge run
The TUI shows everything at a glance — phase, progress, cost, TDD pipeline, quality gates, and Claude's real-time output.
| Key | Action |
|---|---|
| d | Toggle dashboard overlay (cost breakdown, coverage, security findings, code quality) |
| q | Quit with confirmation — gracefully aborts the running Claude process |
When the API rate limit is hit, the dashboard shows a countdown modal:
Forge waits for the exact reset time from the API and resumes automatically. Rate limits don't count as task failures.
When a task fails maxTaskFailures times (default 3), Forge pauses and shows an interactive prompt:
In non-interactive mode (--no-tui), tasks are auto-skipped after maxTaskFailures.
Create .forge/forge.config.json in your project root:
Override any config option via environment variables:
| Variable | Effect |
|---|---|
| FORGE_MAX_ITERATIONS | Override max loop iterations |
| FORGE_MAX_CALLS_PER_HOUR | Override API rate limit |
| FORGE_TDD_ENABLED | Enable/disable TDD enforcement |
| FORGE_SECURITY_ENABLED | Enable/disable security scanning |
Install Forge and let the craft speak for itself.