HyprBox docs GitHub ↗

HyprVault — backups

Restic-backed backups managed from the dashboard. Each policy describes what to back up on which node, when, and how long to keep snapshots. Each run is one execution — started either by the cron job the preset installs, or by a manual trigger from the dashboard / API.

Data model

BackupPolicy 1 ── n BackupRun
            ↑
            └── nodeId → Node, createdBy → User
  • BackupPolicy: name, nodeId, paths (JSON array), schedule (cron), repoUrl (Restic), passwordRef (where to source RESTIC_PASSWORD), keepDaily / keepWeekly / keepMonthly, status (ENABLED/PAUSED).
  • BackupRun: policyId, status (RUNNING/SUCCEEDED/FAILED), snapshotId, sizeBytes, filesCount, durationMs, errorMsg, startedAt, finishedAt.

passwordRef is intentionally a pointer, not the secret. We never store the password in the DB. Two supported forms:

  • env:RESTIC_PASSWORD (default) — reads the agent's environment.
  • file:/etc/hyprbox/restic.pass — reads a file on the host.

Lifecycle

Operator → POST /api/backups            → BackupPolicy created (status ENABLED)
         → POST /api/backups/:id/trigger → Job queued with the hypervault-restic preset
                                              ↓
                                         Agent picks up the job (next heartbeat)
                                              ↓
                                         hyprbox-backup wrapper runs:
                                           1. POST /api/backups/:id/runs → BackupRun created (RUNNING)
                                           2. restic backup
                                           3. POST /api/backups/runs/:runId/complete
                                              with status + sizeBytes + snapshotId

For recurring runs, the same preset drops /etc/cron.d/hyprbox-backup-<id> that calls the wrapper script with no API trigger — the script itself starts the run via POST /runs using its agent token.

Setting up a node for backups

  1. Provision an agent token bound to the node (Settings → Agent tokens in the dashboard). Set HYPRBOX_NODE_TOKEN on the agent.

  2. Provide RESTIC_PASSWORD to the agent. With systemd:

    # /etc/systemd/system/hyprnode.service.d/restic.conf
    [Service]
    Environment="RESTIC_PASSWORD=<your-strong-password>"
    

    Or write it to a file (mode 0600) and reference file:/etc/hyprbox/restic.pass in the policy.

  3. Decide where snapshots live. Restic supports local paths, SFTP, S3, Azure Blob, Backblaze B2, GCS, REST server, and rclone. Examples:

    • s3:s3.amazonaws.com/my-bucket/restic (with AWS_ACCESS_KEY_ID + AWS_SECRET_ACCESS_KEY in the agent's env)
    • b2:bucket-name:/restic
    • sftp:backup@backup-host:/srv/restic
    • /srv/restic (local — only useful for very small fleets with a shared NFS)
  4. Create the policy from the dashboard/dashboard/backups → "+ New policy". Pick the node, list the paths to include, paste the repo URL, set retention.

  5. Click "Backup now →" to verify everything works end-to-end. The first run takes longer because Restic initialises the repo and ingests the full data set.

API surface

Action Endpoint Auth
Create policy POST /api/backups user
List my policies GET /api/backups[?nodeId] user
Detail + last 20 runs GET /api/backups/:id user
Update / pause / resume PATCH /api/backups/:id user
Delete DELETE /api/backups/:id user
Trigger one-shot POST /api/backups/:id/trigger user
Start a run POST /api/backups/:id/runs node token
Complete a run POST /api/backups/runs/:runId/complete node token

Both agent endpoints honour the token's optional nodeId pinning — a token bound to node A cannot start runs for node B's policies.

The trigger endpoint queues a hypervault-restic job; you can watch it stream in the Jobs page like any other deployment.

Variables fed into the preset

When /trigger renders the preset, these variables come from the policy row:

Variable Source Used for
policy_id policy.id run correlation
backup_paths policy.paths joined with : restic backup args
repo_url policy.repoUrl RESTIC_REPOSITORY env
password_ref policy.passwordRef resolved by the wrapper script
keep_daily / keep_weekly / keep_monthly policy.keep* restic forget --keep-*
api_url HYPRBOX_API_URL env on the API wrapper's POST target

Retention semantics

We pass --keep-daily N, --keep-weekly N, --keep-monthly N to Restic's forget --prune. Restic's algorithm keeps the most recent N snapshots in each bucket, so a setting of daily=7, weekly=4, monthly=6 keeps roughly 7 + 4 + 6 = 17 snapshots in steady state (with some overlap when a snapshot satisfies multiple buckets).

The wrapper runs forget --tag policy=<id> so each policy's retention is independent. Two policies sharing a repository don't accidentally prune each other's snapshots.

Troubleshooting

Repo init fails on first run

The wrapper script runs restic init if restic snapshots errors. For S3-compatible repos, that needs the bucket to exist AND the credentials to have write access. Check the wrapper's stdout in the Job detail page.

bash: restic: command not found

The preset's package step installs restic, but apt may have updated to a newer version since the repo was set up. Re-trigger the policy — the install step is idempotent and will pull the current version.

Snapshots succeed but sizeBytes is 0

Restic's summary parsing in the wrapper relies on --json output. Very old Restic versions (<0.12) don't emit that structure. Upgrade Restic or live without the size metric.

Run is RUNNING forever

The agent crashed or lost network mid-run. No automatic sweeper yet (Phase 4.5). Manually:

UPDATE "BackupRun" SET status='FAILED', error_msg='agent disappeared',
  finished_at=now() WHERE id='<id>' AND status='RUNNING';

How to restore from a snapshot

Out of scope for the API right now — Phase 4.5 will add a guided restore flow. For now, SSH to the node and use Restic directly:

export RESTIC_REPOSITORY=s3:...
export RESTIC_PASSWORD=...
restic snapshots --tag policy=<id>
restic restore <snapshot-id> --target /tmp/restore

Limits

  • 4 MB cap on wrapper stdout per job (same as any other Job's output).
  • One run at a time per policy is recommended; the schema doesn't enforce it but Restic's lock file does at the repo level.
  • No alerting on backup failures yet. The activity event published on completion (snap for SUCCEEDED, warn for FAILED) shows up in the Logs page and on the node detail timeline.