API Reference
Base URL in dev: http://localhost:4000. All JSON.
Auth model
| Header / cookie | Used by |
|---|---|
Authorization: Bearer <jwt> |
Operators (CLI, dashboard XHR, raw curl) |
Cookie: hb_token=<jwt> |
Browser proxy + EventSource (set by login) |
Authorization: Bearer hbnt_<token> |
Agents posting heartbeats / pulling jobs |
requireAuth accepts both transports (Bearer first, then cookie) and verifies
the JWT's tv claim against User.tokenVersion on every request — a role
change, password change, password reset, or user delete bumps tokenVersion
and revokes outstanding tokens immediately. The browser cookie is set
server-side as HttpOnly; SameSite=Lax; Secure (prod).
requireNodeToken checks the hbnt_ prefix index + SHA-256 hash compare; agent
write/claim endpoints additionally require a token pinned to the target
node. In dev, anonymous heartbeats are accepted; production sets
HYPRBOX_REQUIRE_NODE_TOKEN=true and rejects them.
Status codes used consistently:
200/201/202/204— success400— validation failure (Zod issues or out-of-range fields)401— missing/invalid token403— token bound to a different resource404— resource not found (also returned to hide existence cross-user)409— state conflict (e.g. cancel a non-queued job)429— rate-limited (login 10/min, register 5/min, global 200/min)
Auth — /api/auth
POST /api/auth/register
Bootstraps the first user as ADMIN (User.count()===0). After that, public
self-registration is closed by default in production (403) — set
HYPRBOX_ALLOW_PUBLIC_REGISTRATION=true to reopen, or have an admin create
accounts via POST /api/users. Default is open outside production (dev/test).
curl -X POST http://localhost:4000/api/auth/register \
-H 'Content-Type: application/json' \
-d '{"email":"ops@example.com","password":"correcthorsebattery","name":"Ops"}'
→ 201 { token, user: { id, email, name, role, createdAt } }. Duplicate email
→ 409. Closed registration → 403.
POST /api/auth/login
curl -X POST http://localhost:4000/api/auth/login \
-H 'Content-Type: application/json' \
-d '{"email":"admin@hyprbox.local","password":"hyprbox-admin"}'
→ 200 { token, user }. Wrong password OR unknown email → 401 (same
response to defeat user enumeration). Rate-limited at 10/min/IP.
GET /api/auth/me
curl http://localhost:4000/api/auth/me -H "Authorization: Bearer $JWT"
→ 200 { user } or 401 if the token is missing/invalid/expired.
POST /api/auth/logout
Clears the hb_token cookie (Max-Age=0). JWTs stay valid until expiry; for
hard revocation bump tokenVersion (see Auth model).
POST /api/auth/change-password (user-auth)
Body { currentPassword, newPassword }. Verifies the current password, refuses
re-use, bumps tokenVersion (logs other sessions out). Rate-limited 10/min.
→ 200 { ok: true }; wrong current → 401; same as current → 400.
POST /api/auth/forgot-password
Body { email }. Always 200 (anti-enumeration). If the email maps to a
user, mints a single-use, SHA-256-hashed reset token (1h TTL) and emails the
link (dev: logged via the mailer seam). Rate-limited 5/min.
POST /api/auth/reset-password
Body { token, newPassword }. Atomically consumes the token (invalid/expired/
used all → 400, identically), sets the new hash, burns sibling tokens, bumps
tokenVersion. Rate-limited 10/min.
Users — /api/users (admin-auth)
GET /api/users
→ { users: [{ id, email, name, role, createdAt }] }.
POST /api/users
Admin-provisioned teammate — the path for new accounts once public registration
is closed. Body { email, name?, role, password } → 201 { user }; duplicate
email → 409. Audit user.create.
PATCH /api/users/:id/role
Body { role }. Bumps the target's tokenVersion. Refuses to demote the last
admin → 409.
DELETE /api/users/:id
Refuses self-delete and last-admin delete (409). → 204.
Nodes — /api/nodes
POST /api/nodes/heartbeat (node-auth)
Agents call this every HYPRBOX_INTERVAL. Inventory fields are optional but
will overwrite previous values when sent — only send them on first registration
or when they actually change.
curl -X POST http://localhost:4000/api/nodes/heartbeat \
-H "Authorization: Bearer $NODE_TOKEN" \
-H 'Content-Type: application/json' \
-d '{
"node_id": "prod-eu-1",
"hostname": "prod-eu-1",
"os": "Debian 12",
"cpu_percent": 23.5,
"ram_percent": 48.2,
"disk_percent": 61.0,
"status": "online",
"agent_version": "0.1.0",
"ip": "10.42.1.21",
"region": "eu-west-3",
"kernel": "6.1.0-18-amd64",
"architecture": "x86_64"
}'
→ 202 { received: true, nodeId }. If the token is pinned to a different
nodeId, → 403.
GET /api/nodes (user-auth)
→ { nodes: [...] }. Status is computed at query time: any node whose
lastSeenAt is older than 10 minutes flips to OFFLINE.
GET /api/nodes/sparks?limit=24 (user-auth)
Batch endpoint feeding the dashboard. Returns the last N CPU values per node
in one round-trip — avoids N parallel /history requests.
→ { sparks: { nodeId: [25, 30, ...] } }.
GET /api/nodes/:nodeId/status (user-auth)
Single node + last heartbeat values + metadata.
GET /api/nodes/:nodeId/history (user-auth)
Last 50 heartbeats in chronological order (oldest first), suited for charting.
GET /api/nodes/:nodeId/activity (user-auth)
Synthesised events: threshold crossings (CPU≥80%, RAM≥90%), gap-reconnects (detected only for gaps >5× the median spacing — avoids false positives on seed/sparse data), plus 3 mocked rows (deploy/snap/info) until those subsystems are real. Capped at 8 events, newest first.
POST /api/nodes/inventory (node-auth, pinned)
The agent ships its Docker container inventory here each tick. The API
cross-references it against BackupPolicy rows to emit/resolve
postgres.no-backup findings.
GET /api/nodes/me/config (node-auth, pinned)
Per-node agent config the agent pulls before each scan →
{ config: { tls_paths?, scan_interval? } }. Unpinned token → 400.
PATCH /api/nodes/:id/config (operator-auth)
Merge-patch the node's agentConfig JSON; null on a key clears it. Audited.
Tokens — /api/tokens
All endpoints require user JWT.
POST /api/tokens
Mint a new agent token. The plaintext is returned exactly once at create time; subsequent reads only return metadata.
curl -X POST http://localhost:4000/api/tokens \
-H "Authorization: Bearer $JWT" \
-H 'Content-Type: application/json' \
-d '{"name":"prod-eu-1","nodeId":"prod-eu-1"}'
→ 201 { token: "hbnt_…", meta: { id, prefix, name, nodeId, createdAt } }.
Set nodeId to pin the token: any heartbeat or job pull with a different
node_id is rejected with 403. Skip it for a fleet-wide token (e.g.
templated AMIs).
GET /api/tokens
→ { tokens: [{ id, prefix, name, nodeId, createdAt, lastUsedAt, revokedAt }] }.
Never contains the plaintext or hash.
DELETE /api/tokens/:id
Soft-revoke (sets revokedAt). Any agent still using it gets 401 on next
call. Other users' tokens → 404 (no existence leak).
Presets — /api/presets
All endpoints require user JWT. Presets are loaded once at API boot from
the repo's presets/ directory; restart the API to pick up YAML edits.
GET /api/presets
→ { presets: [...] } — light shape (name, version, description, tags,
targets, stepCount, variables).
GET /api/presets/:name
→ { preset: <full YAML object> } for audit/inspection.
POST /api/presets/:name/render
Render-to-bash. Variables are substituted, defaults filled in, the result is deterministic.
curl -X POST http://localhost:4000/api/presets/server-light/render \
-H "Authorization: Bearer $JWT" \
-H 'Content-Type: application/json' \
-d '{"variables":{"hostname":"prod-eu-1","ssh_port":2222}}'
→ { name, script, resolvedVariables, warnings, bytes }. Missing required
variable → 400. Unknown variable → ignored with a warning.
Jobs — /api/jobs
POST /api/jobs (user-auth)
Queue a preset application. The bash script is rendered NOW and stored on the job row — later edits to the preset don't retroactively change what runs.
curl -X POST http://localhost:4000/api/jobs \
-H "Authorization: Bearer $JWT" \
-H 'Content-Type: application/json' \
-d '{
"nodeId": "prod-eu-1",
"presetName": "server-light",
"variables": {"hostname":"prod-eu-1","ssh_port":2222}
}'
→ 201 { job }. Node missing → 404. Preset missing → 404. Render
failure (e.g. required variable absent) → 400.
GET /api/jobs?nodeId=…&status=…&limit=… (user-auth)
Lists jobs the caller created. nodeId and status filters optional; limit
defaults to 50, max 200.
GET /api/jobs/:id (user-auth)
Full detail including script and the accumulated stdout/stderr. Other
users' jobs → 404.
POST /api/jobs/:id/cancel (user-auth)
→ 200 { job }. Only QUEUED jobs can be cancelled cleanly; RUNNING → 409
(needs the agent reverse channel — Phase 4).
GET /api/jobs/:id/stream (user-auth, SSE)
Tails one job until terminal state, then closes. Event types:
stdout (delta), stderr (delta), status (on flip), done.
curl -N "http://localhost:4000/api/jobs/$JOB_ID/stream" \
-H "Cookie: hb_token=$JWT"
GET /api/jobs/pending?nodeId=… (node-auth)
The agent's "what should I do?" endpoint. Atomically claims the oldest
QUEUED job for that node (single UPDATE … WHERE status='QUEUED') and
flips it to RUNNING. Two concurrent pulls see exactly one winner.
→ { job: { id, presetName, script, variables } } or { job: null }.
If the token is pinned to a different nodeId → 403.
POST /api/jobs/:id/output (node-auth, pinned)
Append stdout/stderr. The Job row keeps a bounded 4 MB tail per stream for
fast display; the full log is offloaded to JobOutputChunk with a per-job byte
cap (HYPRBOX_JOB_OUTPUT_MAX_BYTES, default 64 MiB).
curl -X POST http://localhost:4000/api/jobs/$JOB_ID/output \
-H "Authorization: Bearer $NODE_TOKEN" \
-H 'Content-Type: application/json' \
-d '{"stdout":"[12:34:56] Installing ufw...\n","stderr":""}'
→ 204. Non-RUNNING job → 409.
POST /api/jobs/:id/complete (node-auth)
Final flush + status flip. exitCode === 0 → SUCCEEDED, otherwise FAILED.
curl -X POST http://localhost:4000/api/jobs/$JOB_ID/complete \
-H "Authorization: Bearer $NODE_TOKEN" \
-H 'Content-Type: application/json' \
-d '{"exitCode":0,"stdout":"[12:35:10] Done.\n"}'
Calling complete twice on the same job → 409.
GET /api/jobs/:id/output/raw?stream=stdout|stderr (user-auth)
Streams the full reconstructed log for one stream (text/plain, as an
attachment) from the chunks — backs the "download full log" link when the
inline tail is truncated. Other users' jobs → 404.
Backups (HyprVault) — /api/backups
Restic backup policies + runs. Policy CRUD is operator-auth and owner-scoped
(other users' policies → 404); the agent run endpoints are node-auth and
require a token pinned to the policy's node.
POST /api/backups (operator)
Create a policy. Body { name, nodeId, paths[], schedule?, repoUrl, passwordRef?, keepDaily?, keepWeekly?, keepMonthly? } → 201 { policy }.
GET /api/backups[?nodeId] (user) / GET /api/backups/:id (user)
List your policies / one policy + its 20 most recent runs.
PATCH /api/backups/:id (operator) / DELETE /api/backups/:id (operator)
Update fields / delete. Owner-scoped.
POST /api/backups/:id/trigger (operator)
Queues a one-shot hypervault-restic job for the policy's node → 202 { jobId, policyId }. Render failure → 400.
POST /api/backups/:id/runs (node-auth, pinned) — agent reports a run start
POST /api/backups/runs/:runId/complete (node-auth, pinned) — agent reports the run finish (status, snapshotId, sizeBytes, filesCount, durationMs)
Findings — /api/findings
The autopilot loop's left half (the discovery side). Scanners on agents post here; the dashboard reads from here.
POST /api/findings (node-auth)
Batch upsert. Dedup happens on (nodeId, type, key) — the same row is
updated, not re-inserted, so the operator-visible detectedAt and any
snooze/manual-resolve state survive across scans.
curl -X POST http://localhost:4000/api/findings \
-H "Authorization: Bearer $NODE_TOKEN" \
-H 'Content-Type: application/json' \
-d '{
"nodeId": "prod-eu-1",
"findings": [{
"type": "tls.expiring",
"key": "cert:app.client.fr:/etc/letsencrypt/live/app.client.fr/fullchain.pem",
"severity": "WARN",
"title": "TLS certificate for app.client.fr expires in 18 days",
"message": "Certificate at /etc/letsencrypt/live/.../fullchain.pem ...",
"relatedEntities": ["domain:app.client.fr", "path:/etc/letsencrypt/live/app.client.fr"],
"metadata": { "daysLeft": "18", "notAfter": "2026-06-14T00:00:00Z" }
}]
}'
→ 202 { received: true, upserted: N }.
The handler looks up matching recommendations by findingTypes and links
them automatically. A previously RESOLVED finding seen again gets
reopened (status flips back to OPEN, resolvedAt cleared).
Pinned node tokens are enforced — nodeId in the body must match the
token's pin if set.
GET /api/findings[?nodeId&severity&include=all] (user-auth)
Lists OPEN and snoozed-past-deadline findings by default. Pass
include=all to see RESOLVED history. Ordered CRITICAL → WARN → INFO,
then newest first.
GET /api/findings/:id (user-auth)
Detail + the linked recommendation (preset name, risk level, description) so the modal can render Apply without a second round-trip.
POST /api/findings/:id/snooze (user-auth)
Body: { hours: number } (1 to 720, default 24). Sets status=SNOOZED
and snoozedUntil. The list endpoint hides snoozed rows until that
deadline passes.
POST /api/findings/:id/resolve (user-auth)
Manual mark — status=RESOLVED, resolvedByUserId set. The next scan
that still detects the problem will reopen it.
GET /api/findings/health/summary (user-auth)
Per-node rollup:
{
"summary": [
{ "nodeId": "prod-eu-1", "bucket": "AT_RISK", "counts": { "critical": 1, "warn": 2, "info": 0 } },
{ "nodeId": "prod-eu-2", "bucket": "ATTENTION", "counts": { "critical": 0, "warn": 1, "info": 1 } },
{ "nodeId": "staging-1", "bucket": "HEALTHY", "counts": { "critical": 0, "warn": 0, "info": 0 } }
]
}
Buckets are computed on the fly: any CRITICAL > 0 → AT_RISK; else any
WARN > 0 → ATTENTION; else HEALTHY. No numeric score by design —
see docs/AUTOPILOT.md.
Stream — /api/stream/nodes (user-auth, SSE)
Fleet-wide event bus. Heartbeats + activity events (warnings, deploys) get
published here. Used by /dashboard (live status patch) and /dashboard/logs
(live feed).
curl -N "http://localhost:4000/api/stream/nodes" \
-H "Cookie: hb_token=$JWT"
Event types: heartbeat, activity. Server emits a : ping comment every
25s to keep proxies from timing out idle connections.
Audit — /api/audit (admin-auth)
GET /api/audit[?user&action&resourceType&resourceId&page&format=csv]
Append-only audit log — logins, register, token/job/backup/user mutations, and
password events. Paginated 50/page; format=csv exports (capped 10k rows, with
spreadsheet-formula cells neutralised). → { events: [...], page, total }.
Agent WebSocket — /api/agent/ws (node-auth, pinned)
Reverse channel. The agent connects with a pinned node token in the
Authorization header (?nodeId= is a consistency check only; ?token= is
ignored). The server pushes {type:"wakeup"} on job create (sub-second pickup
vs the poll tick) and {type:"cancel",jobId} on operator cancel of a RUNNING
job. The agent falls back to polling if the socket is down.
/health
Public, no auth. Returns { status: "ok", version: "0.1.0" }. Used by the
Docker healthcheck and load-balancer probes.
/docs (Swagger UI)
Enabled in dev. In prod only when HYPRBOX_ENABLE_DOCS=true is set
explicitly — exposing the full API surface to anonymous traffic is a
recon gift.
Rate-limit headers
Every response carries:
X-RateLimit-Limit— bucket sizeX-RateLimit-Remaining— calls left in the current windowX-RateLimit-Reset— seconds until the window rolls
The SSE stream (/api/stream/*) is exempt from the global bucket — long-
lived connections shouldn't count.
Errors
All error responses share a shape:
{ "error": "Validation failed", "message": "ssh_port must be <= 65535" }
error is a short tag suitable for switching on; message is a human
sentence safe to surface in a UI toast.