Roadmap

Source of truth for what we ship next, in what order, and why. If you find yourself re-deciding a question that's answered here, the answer here wins. Update this doc instead of arguing.

Last frozen: 2026-06-02 (Phase 8 complete + two internal security passes).

0. TL;DR

We are building HyprBox = infrastructure autopilot for freelance + MSP. The product is the loop discover → finding → recommend → preview → apply → verify. Everything we ship is either:

A new scanner that produces a new kind of Finding.
A new fix preset that resolves an existing finding type.
Infrastructure that makes the loop trustworthy at scale (auth, migrations, cancel, audit UI, observability).
Polish that makes the demo land (UI, copywriting, performance).

Anything that doesn't fit one of those four buckets goes in the Anti-goals list at the bottom of this file. We do not build it.

Read AUTOPILOT.md for the why; this doc is the what + when.

1. Where we are

Phase	Status	Shipped
1 — MVP	✅ Done	Heartbeat → dashboard, agent, CLI
2 — Auth + presets + SSE	✅ Done	JWT + agent tokens, 3 presets, CLI preset commands, hardening
3 — Remote apply	✅ Done	Job queue + atomic claim + agent jobrunner + live tail
4a — HyprVault + audit	✅ Done	Backups (policy + run), audit log on sensitive routes
4b — Autopilot loop	✅ Done	Finding + Recommendation models, TLS scanner, Health page, Fix contract (`risk_level` + `verify`), auto-resolve
5 — Scanner expansion	✅ Done	SSH password-auth + disk usage + Postgres no-backup scanners, 2 new fix presets, inventory cross-ref endpoint, 10 Go parser tests
6 — Make it real	✅ Done	Prisma migrations + drift check, RBAC (VIEWER/OPERATOR/ADMIN), WebSocket reverse channel + RUNNING-job cancel, admin audit UI with CSV export
7 — HyprGuard	✅ Done	9-check security audit scanner (kernel updates, root password, NOPASSWD sudo, /etc world-writable, UFW, fail2ban, unattended-upgrades, LUKS, auditd), `apt-security-upgrade` preset, conditional severity on UFW for public-IP hosts
8 — Production polish	✅ Done	8.1 output offload (chunk-in-DB) + 8.2 change-password + forgot-password reset + 8.3 per-user rate limit + 8.4 GHCR release pipeline + 8.5 Playwright E2E + 8.6 per-node agent config. Fully shipped
9 — HyprWatch v1	🟡 v1	Monitoring as a find→fix→verify loop: opt-in (`hyprwatch` per-node config) `monitoring.absent` scanner → `monitoring.install-stack` recommendation → `monitoring-only` preset (now with `verify:` + `risk_level`) → auto-resolve. Deferred: Loki/log shipping, Alertmanager routing verify, multi-node federation, drift detection. See docs/HYPRWATCH.md

Tests: 167 vitest specs + 32 Go specs + 3 Playwright E2E, all passing. Typecheck clean on API + Web. Go vet clean on agent + CLI. CI green.

This is the baseline. Everything below is forward-looking.

2. Phase 5 — Scanner expansion (shipped 2026-05-30)

Goal. Take the autopilot loop from "works for TLS" to "covers the three pains every SMB server has".

Why this was next. The loop is the product, but ONE finding type is not a product. Three well-chosen findings turn the dashboard from "demo gimmick" into "you should run this on every server".

Shipped:

ssh.password-auth-enabled (CRITICAL) — sshd_config parser handles Include + Match + first-write-wins semantics. Recommendation: ssh.disable-password-auth (risk: DANGEROUS) with hard precheck on authorized_keys, sshd_config backup, sshd -t validation, auto-revert on validation failure.
disk.usage-high (WARN ≥80%, CRITICAL ≥90%) — per mount-point, skips pseudo filesystems. Recommendation: docker-prune (risk: CONFIRM) with "must be docker host" precheck and verify-step that re-checks usage drop.
postgres.no-backup (CRITICAL) — agent ships a docker inventory via POST /api/nodes/inventory; the cross-reference logic lives on the API (containers matching postgres* × BackupPolicy on this node). Reconciliation built in — container removed or policy added → finding auto-resolves on the next inventory tick.
10 Go unit tests for the sshd_config parser.

The fresh-VM success metric from the original spec (≥3 findings, ≥2 with working Apply) is achievable as written.

5.1 SSH password authentication


Scanner	`agent/hyprnode/internal/scanner/ssh.go` — parse `/etc/ssh/sshd_config`
Finding	`ssh.password-auth-enabled` (CRITICAL)
Key	`sshd:/etc/ssh/sshd_config` (one per file path; supports `Include` directives)
Recommendation	`ssh.disable-password-auth` → existing preset extended with `verify`
Preset risk	DANGEROUS — need an authorised key first
Verify	`sshd -t` succeeds AND `grep -qE '^PasswordAuthentication\s+no' /etc/ssh/sshd_config`
Precheck	At least one entry in `/root/.ssh/authorized_keys` OR `/home/*/.ssh/authorized_keys`

Definition of done. Spec: scanner emits the finding; recommendation auto-links it; preview shows the bash with the precheck guard; verify-pass auto-resolves the finding. Apply fails cleanly if precheck doesn't pass (no keys → exit 1 before touching anything).

5.2 Disk usage high


Scanner	`agent/hyprnode/internal/scanner/disk.go` — `gopsutil/v4/disk.Partitions` + `Usage`
Finding	`disk.usage-high` (WARN ≥80%, CRITICAL ≥90%) per mount point
Key	`mount:/var` (mount path; survives device renames)
Recommendation	`disk.docker-prune` (one of two paths: docker cleanup or just "no fix")
Preset risk	CONFIRM
Preset	`docker-prune` — `docker system prune -af --volumes` with a "must be a Docker host" precheck
Verify	Re-run the disk check, new usage < before (or below threshold)

Definition of done. Mount-point granularity (not "the host is at 87%" — "/var is at 87% because Docker has 23 GiB of dangling images"). Threshold env-configurable.

5.3 PostgreSQL without a backup

The marquee demo finding. This is what makes someone say "oh, I want this".


Scanner	`agent/hyprnode/internal/scanner/postgres.go`
Detection	Docker container with image matching `postgres:*` AND no `BackupPolicy` referencing this node AND the container has a mounted volume
Finding	`postgres.no-backup` (CRITICAL — data loss risk)
Key	`container:<container_name>`
Recommendation	`postgres.add-backup` → `hypervault-restic` (already seeded)
Preset variables	`backup_paths` auto-suggested to the container's volume mount points

The detection cross-references DB state (BackupPolicy rows) with agent- reported state (running containers). The cross-reference logic lives on the API, not the agent — the agent ships a docker inventory; the API decides "this looks like a Postgres without a paired policy".

Definition of done. Detection has false-positive rate near zero on the test fleet. False positives are unacceptable here because the fix is not idempotent at zero cost (Restic init on a fresh S3 bucket).

5.4 Phase 5 success metric

A fresh Debian VM running nginx + postgres-via-docker + caddy, scanner tick once: dashboard shows at least 3 findings with at least 2 of them having a working Apply button.

Estimated complexity. ~2 sessions (1 for the 3 scanners + 1 for the preset polish + auto-resolve specs + demo recording).

3. Phase 6 — Make it real (shipped 2026-05-31)

Shipped:

6.1 Prisma migrations — baseline migration 20260531000000_init captures cumulative schema through Phase 5; orphan migrations cleared. Docker entrypoint flipped from db push to migrate deploy. CI gained a migrate diff --from-migrations drift check using a shadow DB so a PR that edits schema.prisma without db:migrate fails fast.
6.2 RBAC — UserRole enum (VIEWER/OPERATOR/ADMIN) baked into JWT at sign time. requireRole() middleware gates every mutation route (jobs, tokens, backups, findings/snooze, findings/resolve). /api/users (admin-only) lists + manages teammates; role updates refuse to demote/delete the last admin. First user on a fresh deployment auto-promotes to ADMIN; subsequent registrations land as OPERATOR.
6.3 WebSocket reverse channel — @fastify/websocket plugin, /api/agent/ws endpoint with per-node connection registry. Server pushes {type:"wakeup"} on Job.create (sub-second pickup instead of waiting for the 5-min poll tick); pushes {type:"cancel", jobId:…} on operator cancel of a RUNNING job → agent's job context is cancelled → bash SIGTERMed. Falls back transparently to polling when the WS is down (gorilla/websocket reconnect with exponential backoff, 1s → 60s). RUNNING-job cancel without WS connection is refused cleanly (409).
6.4 Admin audit UI — GET /api/audit (admin-only) with filters (user/email substring, action, resource type), pagination (50/page), CSV export (?format=csv, cap 10k rows). /dashboard/admin/audit page gated client-side AND server-side; "Admin" nav entry only rendered for ADMIN role.

The "3-person team self-serves" success metric from §3.5 is achievable end-to-end: admin invites teammates via /register, role-management UI lets them become viewer/operator, audit log captures everything.

4. Phase 7 — HyprGuard (shipped 2026-05-31)

Goal. A scheduled security audit that produces a structured list of findings, not a 200-line bash log no one reads — Lynis-shaped, but every item is a Finding with a Recommendation where one exists.

Shipped:

agent/hyprnode/internal/scanner/audit.go — runs all 9 checks every scanner tick. Pure parsing in audit_parse.go (testable against captured fixture output, no shell required).

The 9 checks and their stable keys:

Check	Type	Severity
`apt list --upgradable` shows `-security` packages	`audit.kernel-updates-pending`	WARN
`/etc/shadow` root row has a real password hash	`audit.root-password-set`	WARN
`NOPASSWD:` in `/etc/sudoers{,.d/*}`	`audit.sudo-nopasswd`	WARN
`find /etc -perm -o+w` returns ≥1 file	`audit.world-writable-etc`	WARN
UFW missing or `Status: inactive` or default-allow	`audit.ufw-inactive`	CRITICAL on public-IP host, WARN else
`systemctl is-active fail2ban` ≠ "active"	`audit.fail2ban-down`	WARN
`/etc/apt/apt.conf.d/20auto-upgrades` missing or disabled	`audit.no-automatic-updates`	INFO
`lsblk -no TYPE` has no `crypt` row	`audit.no-disk-encryption`	INFO
`systemctl is-active auditd` ≠ "active"	`audit.no-auditd`	INFO

presets/apt-security-upgrade.yaml (CONFIRM) — runs unattended-upgrade --debug --minimal-upgrade-steps against pending security packages, then verifies apt list --upgradable reports zero -security rows remaining.
Recommendations seeded:
- audit.kernel-updates-pending → apt-security-upgrade
- audit.ufw-inactive / audit.fail2ban-down / audit.no-automatic-updates → existing server-light (one rec, three finding types — that's why findingTypes is a JSON array).
The other 5 findings stay MANUAL — passwd -l root, sudoers review, per-file chmod o-w, LUKS-on-reprovision, auditd-on-purpose are decisions an operator owns. The UI just shows them without an Apply button.

Done-when metric (from the original spec): A fresh Ubuntu 24.04 VM with no hardening produces ≥6 findings; applying server-light resolves ≥4 of them on the next scanner tick. Achievable as wired.

5. Phase 8 — Production polish (complete)

Shipped in this pass:

8.2 Change password — POST /api/auth/change-password requires the current password, refuses re-use, audits success and failure separately. Settings page gains a "Change password" form with inline validation. (Forgot-password email reset is deferred until we wire transactional email — separate decision.)
8.3 Per-user rate limiting — custom keyGenerator on @fastify/rate-limit buckets Bearer/cookie-authenticated traffic by user.sub and falls back to req.ip for anonymous. Solves the NAT'd-office case where the global per-IP bucket was effectively per-organisation.
8.4 GHCR release pipeline — .github/workflows/release.yml matrix over the three Docker images, triggered by v*.*.* tag pushes (or workflow_dispatch with an explicit ref). Stable releases move :latest; pre-releases push the version tag only.
8.6 Per-node agent config — new schema column Node.agentConfig (JSON string) + migration. PATCH /api/nodes/:id/config (operator-auth) merges into existing config; null on a key clears it. GET /api/nodes/me/config (node-token, pinned-only) returns the live config to the agent. Agent pulls before each scanner tick: tls_paths overrides the env default for the TLS scanner, and scan_interval now drives the scanner ticker cadence (runner.Tick returns the next interval; honored since the 2026-06-02 audit pass).
New "Agent config" panel on the node detail page edits TLS paths + scan interval.

Phase 8 fully shipped — nothing deferred. (8.1 output offload, 8.2 change-password + forgot-password reset, 8.3–8.6 all landed; see §5b.) The only loosely-open follow-on idea is opt-in email notifications on backup failures, which rides on the same lib/mailer.ts transport once SMTP is wired.

5b. Phase 8 — leftover scope (deferred)

8.1 Output offload for long logs (shipped 2026-06-01, chunk-in-DB)

Decision made: chunk-in-DB as the self-hosted default (no extra infra). A stream >4 MB no longer loses its head: the full ordered log is persisted in JobOutputChunk (cheap appends), Job.stdout/stderr keep a bounded 4 MB inline tail, and Job.stdoutBytes/stderrBytes track the true totals. GET /api/jobs/:id/output/raw?stream=… streams the full log; the job detail page links to it when truncated. lib/jobOutput.ts is the seam — an S3/MinIO backend is a future opt-in (reimplement appendJobOutput / readJobOutput, routes untouched). Deviation from the original sketch: we keep a 4 MB inline tail (not first+last 100 KB) — simpler, and the full log is one click away.

8.2 Account self-service — email reset (shipped 2026-06-01)

Change-password (0.9.0) + forgot-password reset (0.9.3) both shipped. Reset = single-use SHA-256-hashed tokens (1h TTL, burned on use + siblings), anti-enumeration (forgot-password always 200), rate-limited, audited (password.reset.{request,success,failure}). Email delivery goes through the lib/mailer.ts seam — default logs the link (dev/self-hosted); SMTP/SES/ Postmark is a transport swap, callers untouched. The same transport could later power 8.2bis: email notifications on backup failures (opt-in per user) — still open.

8.5 E2E tests with Playwright (shipped 2026-06-01)

Playwright wired at the repo root (playwright.config.ts + tests/e2e/, pnpm test:e2e). Auth is established once in auth.setup.ts and reused via storageState — keeps the suite under the login route's 10/min rate limit. Three non-destructive specs cover the "I broke the auth cookie" regression class that inject() can't see: login → edge-proxy gate → dashboard lists a node; Health page surfaces a finding with its severity; opening a CONFIRM finding shows the recommendation + Apply button; a MANUAL finding shows no Apply button.

Deliberately read-only — the specs never click Apply, because queuing a preset would be executed for real by a live agent on the host. The "queue a job → wait for SUCCEEDED → see it in the list" tail from the original spec is left for CI, where a sacrificial target + a stack-orchestration step (docker compose up web+api+db, seed a node/findings) can run it safely. Today the specs assume the dev stack is already up with an agent reporting ≥1 node + findings.

Estimated complexity per item. 0.5–1.5 sessions each. The Phase 8 items not deferred above all shipped together; these three are an explicit follow-up.

Parked — not scheduled

Anonymized + encrypted dev-DB snapshot (sanitize PII/secrets → encrypt → devs restore locally; app/login must still work). Proposed 2026-06-01, parked: not in any phase, no audit finding requires it, low value while single-operator. Revisit when the team grows or a finding demands it. If done, keep it tiny: 2 scripts (scripts/dev-data-{export,import}.sh) + 3 surgical SQL ops (truncate risky blobs → 200B; wipe secret columns; TRUNCATE AuditEvent/Heartbeat), smoke = boot + login, ≤2h. Do NOT build a field-by-field transform pipeline.

6. Phase 9 — Commercial edition (open-core, only after one paying customer)

This is the commercial edition, separate from the Apache-2.0 OSS core (see LICENSE/NOTICE). Open-core: the core stays Apache-2.0; these ship under a commercial license. Locked behind a real "I'll pay for this" conversation — don't build speculatively.

SSO via OIDC (Google Workspace, Microsoft Entra ID).
2FA / WebAuthn for admin accounts.
Multi-org / multi-tenant (orgs own nodes, users belong to orgs).
Per-client PDF reports ("here's what HyprBox did for you this month").
White-label branding (logo, colours, domain).

If a customer demands one of these but won't write a PO, say no. The first PO is expected to come from a design-partner MSP who says "I'd pay for multi-tenant" — that's the trigger to start this edition. We don't speculative-build commercial features.

7. Anti-goals (things we WILL NOT build)

When asked "can we…":

Ask	Answer	Why
Infrastructure graph view	Not before 10+ entity types + a paying customer	Sexy to build, marginal until we have meat
Numeric health score (62/100)	No	Every customer asks "why not 64" — we can't answer. Buckets only.
Mobile app	No	Operator persona is at a laptop
Built-in alerting (page on call)	No	Customers keep PagerDuty/OpsGenie; we publish events for them to consume
Custom query language	No	If you need to query, use Postgres directly
Plugin system	No	API is the extension point
Real-time collab features	No	Single-operator-at-a-time is fine
Chat/Slack bot	No	The dashboard IS the interaction model
AI-generated recommendations	No (until a clear win)	Hand-curated recommendations are still the bar; LLMs are at suggestion quality, not "I trust this to run as root" quality
Windows server support	No	Linux only. Period.
Be a Terraform / IaC platform	No — `command:` runner at most	Different control plane (cloud APIs + state/plan/drift); Spacelift/env0/Atlantis own it; off the PME wedge
Be a Kubernetes / Helm platform	No — `command:` runner at most	Cluster API ≠ host-bash agent; Rancher/Portainer/ArgoCD own it; PMEs rarely self-run k8s

This list is not ordered by likelihood of someone asking; it's ordered by how hard "no" is to enforce when the ask gets cute.

8. Decisions locked in

Things we keep re-deciding. Stop re-deciding. The answer is below.

Question	Decision	Source
Polling vs WebSocket for agent	Polling until Phase 6.3	`ARCHITECTURE.md`
Job script: render now or render at apply?	At queue time (audit trail)	`JOBS.md`
Should we offer "reads" RBAC?	No, reads stay open to authenticated users for now	This doc, Phase 6.2
Per-finding-type vs per-finding recommendations	Per-type (one Recommendation row, many Findings link to it)	`FINDINGS.md`
Health score: number or bucket?	Three buckets, no number	`AUTOPILOT.md`
Should `verify:` failure trigger `rollback:` automatically?	No (Phase 4b). Operator decides. `rollback:` lands when a fix needs it.	`PRESETS.md`
Default `risk_level` when not declared?	`CONFIRM` (fail-safe)	`PRESETS.md`
Where does Restic's password live?	NOT in the DB. `passwordRef` pointer only.	`HYPRVAULT.md`
Multi-org from day one?	No — single workspace. Multi-tenant is Phase 9.	This doc

If something feels under-decided, add a row here in a follow-up commit.

9. Risks

Risk	Likelihood	Impact	Mitigation
Scope creep on Phase 5 (operator asks for 10 new scanners)	High	Slows the loop	Ship 3 at most; defer the rest to Phase 7 (HyprGuard)
Phase 6.3 WebSocket adds reliability bugs	Medium	Job loss / double-runs	Fallback to polling; atomic claim still serialises both transports
Migrations cutover breaks existing prod stacks	Medium	Customer downtime	Baseline migration tested on a snapshot of the prod DB before cutover
A scanner false-positives at scale	High	Trust collapse	Treat every false positive as a Sev-2 bug; tighten the `key` first, the detection second
We try to do Phase 9 before a real customer	Medium	Wasted months on enterprise plumbing nobody asks for	Hard rule: no Phase 9 work without a signed PO

10. Living document

Update this file when:

A phase ships → flip its status in §1.
The next phase's scope changes (added or dropped a chunk) → edit the phase block AND note it in CHANGELOG.md.
Anti-goal #N gets challenged seriously (real customer money) → don't edit the anti-goal silently; open a discussion thread, link it here, and only then move the row.

The last-frozen date at the top is the only thing that should drift between PRs. Everything else is intentional.

Roadmap

#0. TL;DR

#1. Where we are

#2. Phase 5 — Scanner expansion (shipped 2026-05-30)

#5.1 SSH password authentication

#5.2 Disk usage high

#5.3 PostgreSQL without a backup

#5.4 Phase 5 success metric

#3. Phase 6 — Make it real (shipped 2026-05-31)

#4. Phase 7 — HyprGuard (shipped 2026-05-31)

#5. Phase 8 — Production polish (complete)

#5b. Phase 8 — leftover scope (deferred)

#8.1 Output offload for long logs (shipped 2026-06-01, chunk-in-DB)

#8.2 Account self-service — email reset (shipped 2026-06-01)

#8.5 E2E tests with Playwright (shipped 2026-06-01)

#Parked — not scheduled

#6. Phase 9 — Commercial edition (open-core, only after one paying customer)

#7. Anti-goals (things we WILL NOT build)

#8. Decisions locked in

#9. Risks

#10. Living document

0. TL;DR

1. Where we are

2. Phase 5 — Scanner expansion (shipped 2026-05-30)

5.1 SSH password authentication

5.2 Disk usage high

5.3 PostgreSQL without a backup

5.4 Phase 5 success metric

3. Phase 6 — Make it real (shipped 2026-05-31)

4. Phase 7 — HyprGuard (shipped 2026-05-31)

5. Phase 8 — Production polish (complete)

5b. Phase 8 — leftover scope (deferred)

8.1 Output offload for long logs (shipped 2026-06-01, chunk-in-DB)

8.2 Account self-service — email reset (shipped 2026-06-01)

8.5 E2E tests with Playwright (shipped 2026-06-01)

Parked — not scheduled

6. Phase 9 — Commercial edition (open-core, only after one paying customer)

7. Anti-goals (things we WILL NOT build)

8. Decisions locked in

9. Risks

10. Living document