v0.5 — AI-Powered Software Craftsmanship

Build software
the right way

Forge is an autonomous development orchestrator that practices the craft: tests first, security always, every commit meaningful. It doesn't just write code — it builds software the way a disciplined team would.

$ cd my-api
$ forge init
$ forge import requirements.md
Imported 12 tasks from PRD (3 critical, 5 high, 4 medium)
$ forge run
╭──── FORGE Development Loop ────╮
Phase: IMPLEMENTING Tasks: 5/12 Iter: 7
Elapsed: 14:32 Cost: $1.24 Commits: 15
[██████░░░░░░░░] 42%
Task: Implement JWT auth middleware
✓RED●GREEN○REFACTOR○Gates

AI that writes code is easy.
AI that crafts software is hard.

Most AI tools generate code. Forge builds software. There's a difference.

Software craftsmanship means discipline. It means writing the test before the implementation. It means scanning for secrets and vulnerabilities on every change. It means each commit tells a story. It means stopping when something is wrong rather than plowing forward.

Forge encodes these principles into an autonomous loop. You describe what to build. Forge builds it the way a senior engineer would — methodically, safely, and with a clean git history you'd be proud to ship.

  • 🧪

    Tests First, Always

    Every feature starts with a failing test. No exceptions. Red-Green-Refactor isn't a suggestion — it's enforced.

  • 🛡

    Security Is Not Optional

    Secret detection, SAST scanning, and dependency audits run on every iteration. Critical findings block the loop.

  • 📝

    Every Commit Tells a Story

    Conventional commits per TDD phase: test:, feat:, refactor:. Your git log reads like documentation.

  • ⚠️

    Know When to Stop

    The circuit breaker detects stagnation — repeated failures, no progress, regressions — and halts before wasting time and tokens.

The full toolkit for autonomous development

Every practice that makes software reliable — automated, enforced, and built into the loop.

🧪

TDD Enforcement

Red-Green-Refactor cycle tracked and enforced. Tests are written before code, with regression detection on refactor.

🤖

Multi-Agent Teams

6 specialized agents — Architect, Implementer, Tester, Reviewer, Security, Documenter — automatically matched to tasks.

🛡

Security Scanning

Secret detection, SAST vulnerability scanning, and dependency audits run on every iteration. Blocks on critical findings.

Quality Gates + Auto-Fix

5-gate pipeline: tests, coverage, security, lint, and commit validation. When gates fail, Claude automatically fixes the issues and re-runs.

📝

Conventional Commits

Every TDD phase produces a commit: test: for red, feat: for green, refactor: for cleanup. Clean git history.

Circuit Breaker

Nygard pattern detects stagnation — repeated failures, no progress, test regressions — and stops the loop before wasting tokens.

📊

Live Dashboard

Real-time Ink TUI with cost tracking, TDD phase pipeline, progress bar, Claude stream output, and toggleable detail overlay. Press d for deep metrics, q to quit safely.

💰

Cost Tracking

Real-time API cost monitoring — total spend, per-task cost, per-phase breakdown, and average cost per call. Stay in control of your token budget.

🧠

Smart Task Ordering

Tasks are sorted by priority and dependency depth. Foundational tasks with the most dependents are built first — not random order.

Rate Limit Resilience

Hits a rate limit? Forge shows a countdown modal, waits for the exact reset time, and resumes automatically. No task failures, no wasted retries.

🔄

Session Continuity

Resume interrupted runs. Task completion persists to disk. Context-exhausted sessions rotate automatically. Pick up where you left off.

🧑‍💻

Human-in-the-Loop

Task stuck? Forge pauses and asks you: retry with guidance, defer to later, skip, or abort. Your hint is injected into the next attempt.

Deferred Tasks

Skip a task for now, work on others, come back later. Deferred tasks deprioritize behind pending work and retry with a fresh count.

📐

Spec-Kit Integration

Use GitHub's spec-kit for planning (specify, plan, tasks), Forge for execution. Best of both worlds.

Each iteration, every time

No shortcuts. The same disciplined process on every task, whether it's the first or the fiftieth.

🎯
Select Task
priority + dependency sort
🔴
Red
write failing test
💾
Commit
test:
🟢
Green
implement to pass
💾
Commit
feat:
🛡
Security
scan & audit
Gates
auto-fix on fail
🟡
Refactor
clean up
💾
Commit
refactor:

Plan with spec-kit.
Build with Forge.

Use GitHub's spec-kit for the specification and planning phases. Forge reads the output and executes autonomously with TDD, security, and quality gates.

  1. 1
    Specify — Generate spec.md and constitution.md with spec-kit
  2. 2
    Plan — Generate plan.md with architecture decisions
  3. 3
    Tasks — Generate tasks.md with phased, dependency-ordered tasks
  4. 4
    Forge run — Auto-detects specs/ and executes with full context injection
specs/tasks.md
## Phase 1: Setup

- [x] T001 [P] Initialize project
- [x] T002 Configure CI (depends on T001)

## Phase 2: Auth (Priority: P1)

- [ ] T003 [US1] Login endpoint
- Returns JWT on success
- Returns 401 on failure
- [ ] T004 [US1] Auth middleware (depends on T003)

## Phase 3: Polish

- [ ] T005 [P] Add rate limiting

Everything you need to get started

From first install to full autonomous loops — step by step.

Quick Start

Get from zero to running in under a minute. You need Node.js ≥ 20 and Claude Code CLI installed.

$ npm install -g @redgreen-labs/forge-cli

# Run inside your existing project directory
$ cd my-project
$ forge init

# Import your requirements document
$ forge import requirements.md

# Start the autonomous development loop
$ forge run --iterations 20

# Check progress anytime
$ forge status

That's it. Forge reads your requirements, builds a task dependency graph, and starts executing — test-first, with security scanning and quality gates on every iteration.

Commands

CommandDescription
forge initInitialize project, auto-detect workspaces and language
forge import <file>Import a PRD (Markdown or JSON), scan and auto-decompose
forge runStart the autonomous development loop
forge statusShow session progress and quality metrics
forge reportGenerate a project health report (terminal, JSON, or HTML)
forge decomposeDecompose large tasks into smaller TDD-friendly subtasks
forge agentsList available agent roles and their tools

forge init

OptionDescription
-n, --name <name>Project name
-i, --interactiveGuided PRD creation with questions
-f, --forceOverwrite existing .forge directory
--no-scanSkip workspace auto-detection
-v, --verboseShow detailed scan output

forge import

OptionDescription
-v, --verboseShow detailed scan output
--no-scanSkip codebase scan for existing implementations
--no-decomposeSkip automatic decomposition of large tasks

forge run

OptionDescription
-n, --iterations <n>Maximum iterations (default: 50)
--resumeResume from previous run, skipping completed tasks
--no-tuiDisable live TUI (plain text output)
-v, --verboseShow detailed executor output
--soloSingle agent mode (no team rotation)
--dry-runSimulate execution without running Claude

forge status

OptionDescription
--jsonOutput as JSON
-w, --watchRefresh status every few seconds
--interval <seconds>Watch interval in seconds (default: 3)

forge report

OptionDescription
-f, --format <type>Output format: terminal, html, or json (default: terminal)

forge decompose

OptionDescription
--threshold <n>Complexity threshold 1-10 (tasks above this are decomposed)
--max-subtasks <n>Max subtasks per parent task
--dry-runShow which tasks would be decomposed without calling Claude
-v, --verboseShow detailed output

Task Sources

Forge supports three task formats, auto-detected in priority order:

1. Spec-Kit (recommended)

Use GitHub's spec-kit for planning, then let Forge execute:

# Generate specs with spec-kit
$ npx spec-kit specify
$ npx spec-kit plan
$ npx spec-kit tasks

# Forge auto-detects specs/tasks.md
$ forge run

Forge reads from the specs/ directory:

FilePurpose
specs/tasks.mdTask list with T-IDs, phases, dependencies
specs/constitution.mdProject principles — injected into agent prompts
specs/spec.mdDetailed requirements — injected into agent prompts
specs/plan.mdArchitecture decisions — injected into agent prompts

Spec-kit task format supports markers: [P] = parallelizable, [US1] = user story ref, (depends on T001) = dependency.

2. Forge PRD (JSON)

$ forge import requirements.md    # Parses to .forge/prd.json
$ forge run

3. Markdown Task List

# Place a tasks.md in .forge/
$ forge run

Live Dashboard

The TUI shows everything at a glance — phase, progress, cost, TDD pipeline, quality gates, and Claude's real-time output.

╭──────────────────── FORGE Development Loop ────────────────────╮ ┌──────────────────────────────────────────────────────────────────┐ Phase: IMPLEMENTING Tasks: 5/12 Iter: 7 Elapsed: 14:32 Cost: $1.24 Commits: 15 Files: 3 [████████████░░] 42% Task: Implement JWT auth middleware ✓RED●GREEN○REFACTOR○Gates (2 cycles) Gates: ✓tests ✓security ✓lint ✗coverage ├──────────────────────────────────────────────────────────────────┤ Claude Output ⚡ Writing src/auth/middleware.ts... Running npm test -- --reporter verbose Tests 42 passed (42) Done — Cost: $0.12 Duration: 45s ├──────────────────────────────────────────────────────────────────┤ [d] Dashboard [q] Quit tests sec lint cov └──────────────────────────────────────────────────────────────────┘

Keyboard Shortcuts

KeyAction
dToggle dashboard overlay (cost breakdown, coverage, security findings, code quality)
qQuit with confirmation — gracefully aborts the running Claude process

Rate Limit Modal

When the API rate limit is hit, the dashboard shows a countdown modal:

┌──────────────────────────────────────┐ API Rate Limit Reached Waiting for rate limit to reset... 2h 34m 12s Resets at 3:45:00 PM Session will resume automatically └──────────────────────────────────────┘

Forge waits for the exact reset time from the API and resumes automatically. Rate limits don't count as task failures.

Human-in-the-Loop

When a task fails maxTaskFailures times (default 3), Forge pauses and shows an interactive prompt:

┌──────────────────────────────────────────────┐ Task Failed (3x) Automated UI tests (integration_test) Green phase failed: Process timed out What would you like to do? ▸ Retry with guidance — provide a hint Skip for now — defer to later Skip permanently — won't retry Abort session — stop forge Use ↑↓ arrows and Enter to select └──────────────────────────────────────────────┘
  • Retry with guidance — Type a hint (e.g., "Use widget tests, not integration tests"). Your guidance is injected into the next attempt's prompt.
  • Skip for now (defer) — The task moves to the back of the queue. Other tasks run first, then it retries with a fresh failure count.
  • Skip permanently — The task is marked as skipped and won't be retried.
  • Abort session — Stops Forge immediately.

In non-interactive mode (--no-tui), tasks are auto-skipped after maxTaskFailures.

Configuration

Create .forge/forge.config.json in your project root:

{ "maxIterations": 50, "maxCallsPerHour": 100, "timeoutMinutes": 15, "tdd": { "enabled": true, "requireFailingTestFirst": true, "commitPerPhase": true }, "coverage": { "lineThreshold": 80, "branchThreshold": 70 }, "security": { "enabled": true, "sast": true, "dependencyAudit": true, "secretScanning": true, "blockOnSeverity": "high" }, "agents": { "team": ["architect", "implementer", "tester", "reviewer"], "soloMode": false } }

Environment Variables

Override any config option via environment variables:

VariableEffect
FORGE_MAX_ITERATIONSOverride max loop iterations
FORGE_MAX_CALLS_PER_HOUROverride API rate limit
FORGE_TDD_ENABLEDEnable/disable TDD enforcement
FORGE_SECURITY_ENABLEDEnable/disable security scanning

Start building the right way

Install Forge and let the craft speak for itself.

npm install -g @redgreen-labs/forge-cli
Requires Node.js ≥ 20 and Claude Code CLI