HyprBox docs GitHub ↗

Architecture

State as of Phase 3 + Solidification. Each subsystem is the smallest thing that delivers the user-facing feature it owns; we deliberately avoid premature abstractions (no event sourcing, no message broker, no service mesh).

Bird's-eye view

                          ┌──────────────────────┐
                          │   Browser (operator) │
                          └──────────┬───────────┘
                                     │ HTTPS
                                     ▼
                       ┌────────────────────────────┐
                       │  apps/web — Next.js 16     │
                       │  Studio dark dashboard     │
                       │  EventSource → /api/stream │
                       └─────────────┬──────────────┘
                                     │ Bearer JWT / cookie
                                     ▼
   ┌─────────────────┐    ┌────────────────────────────┐    ┌──────────┐
   │  cli/hyprbox    │───▶│  apps/api — Fastify 5      │───▶│ Postgres │
   │  (Go + Cobra)   │    │  /auth /nodes /tokens      │    │ 16       │
   └─────────────────┘    │  /presets /jobs /stream    │    └──────────┘
                          └───┬──────────────────▲─────┘
                              │ rendered bash    │ Bearer node token
                              │   + status flips │ heartbeat + poll
                              ▼                  │
                       ┌─────────────────────────┴──┐
                       │  agent/hyprnode (Go)       │
                       │  collector + reporter +    │
                       │  jobrunner (bash -s)       │
                       └────────────┬───────────────┘
                                    │
                                    ▼
                          ┌──────────────────────┐
                          │ Linux server (target)│
                          └──────────────────────┘

Components

apps/web — operator dashboard

Next.js 16 App Router, React 19, Tailwind v4. Pages:

  • /login — credentials → POST /api/auth/login → stores JWT in localStorage AND the hb_token cookie (the cookie is what the proxy/middleware checks).
  • /dashboard — fleet overview: hero chart, KPI cards, node grid with per-node sparklines fed by /api/nodes/sparks.
  • /dashboard/nodes/[nodeId] — detail page: 3 BigMetric cards with /history sparklines, details panel, activity timeline.
  • /dashboard/deploys — preset catalogue. Click Preview → modal with rendered bash + variables form. Click Apply → node-picker → POST /api/jobs.
  • /dashboard/jobs — job list (polled every 3s).
  • /dashboard/jobs/[id] — live tail via SSE on /api/jobs/:id/stream.
  • /dashboard/logs — live feed of heartbeats + activity events via SSE.
  • /dashboard/settings — agent token management (create/revoke + one-shot reveal).
  • /dashboard/alerts — Coming Soon stub.

Edge proxy (apps/web/src/proxy.ts, ex-middleware.ts — Next.js 16 rename) gates every /dashboard/* route on the presence of the hb_token cookie. Without it, the browser is redirected to /login?next=….

apps/api — control plane

Fastify 5 + Prisma 6 + Zod. Single process, listens on :4000. Plugins in order: helmet, rate-limit, cors (credentials enabled), cookie, jwt (reads both Authorization: Bearer and hb_token cookie), swagger (only when HYPRBOX_ENABLE_DOCS=true in prod). The global rate-limit bucket is per authenticated user for Bearer/cookie JWTs, per IP otherwise.

Routes are split by domain in src/routes/*.ts:

  • auth.ts/login, /register, /me, /logout. Bcrypt password hashing, rate-limit 10/min per IP on /login, 5/min on /register.
  • nodes.ts/heartbeat (POST, agent token), / (GET, user JWT), /:nodeId/status|history|activity, /sparks batch.
  • tokens.ts — node-token CRUD. Plaintext returned exactly once at creation; the rest of the API only sees the SHA-256 hash.
  • presets.ts/ list, /:name detail, /:name/render (POST → bash script).
  • jobs.ts — user side (POST /, GET /, GET /:id, POST /:id/cancel, GET /:id/stream SSE) + agent side (GET /pending, POST /:id/output, POST /:id/complete).
  • stream.ts/api/stream/nodes — process-local EventEmitter bus.
  • backups.ts — HyprVault: BackupPolicy CRUD + agent-side run lifecycle.
  • lib/audit.ts — append-only log helper; called from auth, tokens, jobs, backups after each sensitive mutation.

Middleware:

  • requireAuth extracts a token from header OR cookie and calls app.jwt.verify synchronously. We bypass req.jwtVerify() because @fastify/jwt v9 with cookie support refuses to fall back to the header when the cookie is absent (see docs/SECURITY.md).
  • requireNodeToken looks up the token by 8-char prefix index, then timingSafeEqual against the SHA-256 hash. Optional in dev unless HYPRBOX_REQUIRE_NODE_TOKEN=true.

agent/hyprnode — server-side agent

Go 1.23, statically linked, ~9 MB. Two responsibilities, one tick loop:

  1. Heartbeat — collect host metrics (gopsutil/v4) → POST to /api/nodes/heartbeat with the optional inventory fields (ip, region, kernel, architecture) sent on the first heartbeat or whenever they change.
  2. Job runner — after the heartbeat, drain queued jobs by polling GET /api/jobs/pending?nodeId=…. Each claimed job is executed via bash -s with stdin = rendered script; stdout/stderr are buffered and flushed to /output every 1.5s, then /complete reports the final exit code.

Graceful shutdown via signal-aware context — a running bash gets killed if the parent receives SIGINT/SIGTERM.

cli/hyprbox — operator CLI

Go + Cobra. Single command tree:

  • hyprbox status [--api-url] [--token] [--json] — fleet table.
  • hyprbox preset list|show|preview|apply — preset operations. apply is scoped to --target=local (Phase 3.5 — remote will route via the agent).

Both API URL and token come from --flag > $HYPRBOX_API_URL/$HYPRBOX_API_TOKEN > defaults.

Data model

User 1 ── n NodeToken
User 1 ── n Job
Node 1 ── n Heartbeat
Node 1 ── n Job
  • User — operators with bcrypt password hash.
  • NodeToken — agent shared secret: prefix (8 chars, indexed), SHA-256 hash, optional nodeId for pinning.
  • Node — registered hosts. Status enum (ONLINE/OFFLINE/WARNING/CRITICAL/UNKNOWN); effective status is computed at query time from lastSeenAt (>10 min → OFFLINE).
  • Heartbeat — append-only metric history. ~24h@1min = ~1440 rows/node.
  • Job — preset application: script frozen at queue time, status flips through QUEUED → RUNNING → (SUCCEEDED|FAILED|CANCELLED). Output capped at 4 MB per stream (tail kept on overflow).
  • BackupPolicy / BackupRun — HyprVault. Policy is the recipe, run is one execution. passwordRef is a pointer (env:NAME / file:/path) — the password itself is never stored.
  • AuditEvent — append-only trail of sensitive mutations (login, token CRUD, job CRUD, backup CRUD). See docs/AUDIT.md.

Full schema: apps/api/prisma/schema.prisma.

Flows

Heartbeat (every 5min in prod, 10s in dev)

agent.tick()
  → collector.Collect()              # gopsutil host metrics
  → POST /api/nodes/heartbeat        # Bearer agent token (optional in dev)
      → requireNodeToken             # prefix lookup + hash compare
      → prisma.node.upsert           # create or refresh metadata
      → prisma.heartbeat.create
      → publish({type: 'heartbeat'}) # fans out to /api/stream/nodes
  → 202 Accepted

Queue + run a preset

Operator → POST /api/jobs
  → requireAuth → AuthedUser
  → getPreset(name) + renderPreset(preset, vars)  # bash frozen here
  → prisma.job.create({ status: QUEUED, script })
  → publish({type: 'activity', kind: 'deploy'})
  → 201

agent.tick() (next heartbeat)
  → GET /api/jobs/pending?nodeId=X
      → SELECT WHERE status=QUEUED ORDER BY createdAt LIMIT 1
      → UPDATE ... WHERE id=X AND status=QUEUED   # atomic claim
      → if count=0 → return null (someone else got it)
      → publish({kind: 'info', msg: 'Agent picked up...'})
  → exec /bin/bash -s < script
  → every 1.5s: POST /:id/output {stdout, stderr}
  → on exit: POST /:id/complete {exitCode, ...}
      → prisma.job.update({ status, finishedAt, ... })

Live tail in the dashboard

Browser /dashboard/jobs/[id]
  → GET /api/jobs/:id     (initial state, includes stdout so far)
  → EventSource /api/jobs/:id/stream
      → server polls Job row every 500ms
      → emits `stdout` / `stderr` events for the delta since last tick
      → emits `status` on flip + `done` on terminal state, then closes
  → React patches the console buffer; auto-scroll unless the user
    scrolled up (detected via scrollTop reflow).

Why these choices

  • Polling, not WebSocket, for agent → API: agents already poll for heartbeats. Adding a job-pull on the same tick was zero new infrastructure. Latency = HYPRBOX_INTERVAL (5 min default), which is fine for ops jobs. WebSocket lands in Phase 4 if customers need sub-minute apply latency.
  • SSE, not WebSocket, for API → browser: one-way push fits the use case (heartbeats, job output). SSE works through reverse proxies without sticky sessions, EventSource reconnects for free, and the protocol is text — easy to debug with curl -N.
  • Cookie + header dual auth: EventSource cannot set custom headers, so SSE endpoints depend on the hb_token cookie. The same cookie is also what the Next.js proxy reads to gate /dashboard/* without round-tripping the API. For everything else (CLI, agent, API consumers), Authorization: Bearer is the canonical transport.
  • Render-at-queue, not render-at-run: storing the bash script on the Job row means a later schema change to the preset doesn't retroactively rewrite what a job actually ran. The audit trail is exact and reproducible.
  • Atomic claim via updateMany: PostgreSQL row locking + WHERE status='QUEUED' guarantees exactly one winner across parallel pollers. No advisory locks, no job queue library; the schema is the queue.
  • Prefix-indexed token lookup: the first 8 chars of hbnt_… are unique and indexed, so the /heartbeat hot path is a single index scan + one hash compare. The full token never appears in a query.

Operational properties

Property Current
API process model Single Fastify process. Horizontal scale needs sticky sessions for SSE OR Redis pub/sub.
DB PostgreSQL 16. Migrations via prisma db push (no migration history yet — landing in Phase 4).
Logs API: pino JSON in prod, pino-pretty in dev. Agent: stdlib log to stderr.
Metrics Application metrics: none yet. Host metrics: collected by HyprNode for the host it runs on, not the API/web. Phase 4 will surface API-level metrics via Prometheus.
Backups None automated. Postgres dump = manual. HyprVault module (Phase 4) will own this.
Auth lifetime JWT TTL = 7 days, no refresh token, no revocation list. Node tokens are revocable per-row.