Findings & Recommendations
The data model behind the autopilot loop. Read docs/AUTOPILOT.md
first for the why; this doc covers the how.
Data model
Recommendation 1 ──n── Finding
│
└── nodeId → Node
└── resolvedByUserId → User? (manual resolve)
└── resolvedByJobId ? (auto-resolve via job)
Finding
A discrete risk detected by a scanner. Every Finding row answers "what's wrong, on which node, and how do I know it's the same thing I saw last scan?"
model Finding {
id String @id @default(cuid())
nodeId String // FK Node.nodeId
type String // "tls.expiring", "postgres.no-backup", ...
key String // discriminator inside (nodeId, type)
severity FindingSeverity // INFO | WARN | CRITICAL
title String // one-line headline
message String @db.Text // operator-facing explanation
recommendationId String? // nullable: not every finding has a fix yet
relatedEntities String @default("[]") // JSON array of "kind:id" strings
metadata String @default("{}") // scanner output, curate to small JSON
status FindingStatus // OPEN | SNOOZED | RESOLVED
detectedAt DateTime @default(now())
lastSeenAt DateTime @default(now())
resolvedAt DateTime?
resolvedByJobId String?
resolvedByUserId String?
snoozedUntil DateTime?
@@unique([nodeId, type, key]) // dedup contract
@@index([nodeId, status, severity])
@@index([recommendationId])
}
The (nodeId, type, key) unique constraint is the dedup contract.
When the same scanner re-sees the same problem, it upserts: the row's
lastSeenAt advances; the id, detectedAt, and any user actions
(snooze, manual resolve) survive.
Recommendation
A vetted mapping from a finding type to a preset to run. Curated server-side, seeded into the DB; not user-editable in MVP.
model Recommendation {
id String @id // "tls.caddy-renew"
title String
description String @db.Text
presetName String // FK by name into the YAML catalogue
riskLevel RecommendationRisk // SAFE | CONFIRM | DANGEROUS | MANUAL
findingTypes String @default("[]") // JSON array of finding `type`s
createdAt DateTime @default(now())
updatedAt DateTime @updatedAt
findings Finding[]
}
One recommendation can serve multiple finding types — that's why
findingTypes is a JSON array, not a single field. It's also why we
don't need a join table: the row count stays in the dozens, the JSON
parsing is cheap on read.
Each new recommendation lands in apps/api/prisma/seed-recommendations.ts
AND ships with a matching preset YAML. The seed script is idempotent —
re-running it upserts on the stable string id.
Severity model
| Severity | What it means | Health bucket impact |
|---|---|---|
INFO |
FYI; no immediate action needed | Doesn't move the node |
WARN |
Should be addressed soon; not on fire | Bumps the node to Attention |
CRITICAL |
On fire OR will be very soon (cert expiring < 7d, no DB backup) | Bumps the node to At risk |
The dashboard computes per-node health as: CRITICAL > 0 ⇒ At risk else
WARN > 0 ⇒ Attention else Healthy. No numeric score — see
AUTOPILOT.md for the why.
Lifecycle
┌──────────┐
│ (seed) │ scanner detects something for the first time
└────┬─────┘
│
▼
┌──────────┐
┌────────│ OPEN │────────┐
│ └────┬─────┘ │
│ │ │ next scan: still there?
│ │ │ → upsert, lastSeenAt advances, stays OPEN
user clicks agent re-scan │ → still there? AND scan misses for 3 ticks? → RESOLVED
Snooze after job │ → not there? → RESOLVED, resolvedByJobId set
for N hours success
│ │
▼ ▼
┌─────────┐ ┌──────────┐
│ SNOOZED │ │ RESOLVED │ terminal — but a future re-detect creates a NEW row
└────┬────┘ └──────────┘ (the old one stays for history)
│
│ snoozedUntil elapses
▼
┌─────────┐
│ OPEN │
└─────────┘
A few subtleties:
- Auto-resolve via job: when a Job finishes SUCCEEDED, the API looks
at findings whose
recommendationId.presetName == job.presetNameANDnodeId == job.nodeIdANDstatus == OPEN, and marks themRESOLVEDwithresolvedByJobId = job.id. The next scan that confirms the fix worked just leaves them resolved; the next scan that re-detects the problem creates a NEW finding (differentid, same dedup tuple would conflict — so we generate a new key, e.g. with a date suffix, to make the new occurrence visible). - Snooze with a deadline:
snoozedUntilis a timestamp. The UI hides snoozed findings by default; the API treatsOPENandSNOOZED-but- past-the-deadlineidentically when computing the health bucket. - History stays: RESOLVED rows are kept forever (the table is small). Use them to answer "did we already fix this on this node before?".
Adding a new finding type
A scanner emits findings of a type. To wire a new type end-to-end:
- Pick a stable type string. Format:
<scanner>.<short-name>, kebab-case, dot-separated. Examples:tls.expiring,ssh.password-auth,postgres.no-backup,disk.usage-high. - Define the
key. It's the discriminator that says "this finding is distinct from another finding of the same type on the same node". For TLS expiry, the key is the cert subject + path. For "disk usage high", it's the mount point. - Pick a severity. Be conservative — INFO is the default; promote to WARN/CRITICAL only when the operator should drop what they're doing.
- Write the scanner (Go, in
agent/hyprnode/internal/scanner/<name>/). It POSTs[{type, key, severity, title, message, metadata, relatedEntities}]to/api/findingsafter every tick. - (Optional) Add a recommendation. Add a row to
seed-recommendations.tsand write the matching preset. A finding type without a recommendation is acceptable — it shows in the UI without an Apply button.
API surface
| Action | Endpoint | Auth |
|---|---|---|
| Agent: upsert batch | POST /api/findings |
node token |
| Agent: docker inventory (cross-ref) | POST /api/nodes/inventory |
node token |
| List for the operator | GET /api/findings[?nodeId&status] |
user JWT |
| Detail | GET /api/findings/:id |
user JWT |
| Snooze | POST /api/findings/:id/snooze |
user JWT |
| Manual resolve | POST /api/findings/:id/resolve |
user JWT |
| Health rollup per node | GET /api/findings/health/summary |
user JWT |
The full reference lives in docs/API.md.
Built-in finding types
The catalogue as of Phase 7. New types land here when the matching scanner ships.
| Type | Severity | Scanner | Key shape | Recommendation |
|---|---|---|---|---|
tls.expiring |
WARN <30d, CRITICAL <7d | scanner/tls.go walks LE + /etc/ssl/certs/hyprbox/* |
cert:<subject>:<path> |
tls.caddy-renew (SAFE) |
tls.unreadable |
INFO | scanner/tls.go |
<path> |
none — operator visibility only |
ssh.password-auth-enabled |
CRITICAL | scanner/ssh.go parses sshd_config (Include + Match + first-write-wins) |
sshd:<config-path> |
ssh.disable-password-auth (DANGEROUS) |
ssh.unreadable |
INFO | scanner/ssh.go |
sshd:<path> |
none |
disk.usage-high |
WARN ≥80%, CRITICAL ≥90% | scanner/disk.go via gopsutil, skips pseudo FS |
mount:<path> |
disk.docker-prune (CONFIRM) — only meaningful on docker hosts |
postgres.no-backup |
CRITICAL | API cross-ref on POST /api/nodes/inventory × BackupPolicy |
container:<name> |
postgres.add-backup (CONFIRM) → hypervault-restic |
audit.kernel-updates-pending |
WARN | scanner/audit.go parses apt list --upgradable for -security source |
audit:kernel-updates |
audit.apply-security-updates (CONFIRM) → apt-security-upgrade |
audit.root-password-set |
WARN | /etc/shadow line for root has a real hash starting with $ |
audit:root-password |
none — MANUAL (passwd -l root) |
audit.sudo-nopasswd |
WARN | greps /etc/sudoers{,.d/*} for NOPASSWD: |
audit:sudo-nopasswd |
none — MANUAL (per-rule review) |
audit.world-writable-etc |
WARN | find /etc -xdev -type f -perm -o+w returns ≥1 |
audit:world-writable-etc |
none — MANUAL (per-file chmod o-w) |
audit.ufw-inactive |
CRITICAL on public-IP host, WARN otherwise | ufw status verbose parsed for Status: active AND Default: deny (incoming) |
audit:ufw |
audit.server-light-baseline (CONFIRM) → server-light |
audit.fail2ban-down |
WARN | systemctl is-active fail2ban ≠ active |
audit:fail2ban |
audit.server-light-baseline (CONFIRM) |
audit.no-automatic-updates |
INFO | /etc/apt/apt.conf.d/20auto-upgrades missing or both directives ≠ "1" |
audit:auto-upgrades |
audit.server-light-baseline (CONFIRM) |
audit.no-disk-encryption |
INFO | lsblk -no TYPE has no row equal to crypt |
audit:luks |
none — MANUAL (LUKS can't be added live) |
audit.no-auditd |
INFO | systemctl is-active auditd ≠ active |
audit:auditd |
none — INFO only |
Reconciliation rules per type
Not all findings are emit-only. Some get auto-resolved by the API when the underlying state changes, without waiting for a Job to run:
postgres.no-backup— thePOST /api/nodes/inventoryendpoint walks existing OPEN findings on the node. If the container in the key is no longer present in the live inventory, OR a BackupPolicy now targets the node, the finding flips to RESOLVED on that tick. The next inventory with the container back AND no policy re-emits.- All other types — resolved either by a successful Job whose preset
matches the recommendation, OR by a manual
POST /:id/resolve. Re-emit on the next scan if the scanner sees the problem again.
Writing dedup-safe scanners
Two rules:
- Never invent a new
keyfor the same underlying problem. A scanner that emitskey="cert-app.client.fr-2026-08-25"(with a date) will produce a new finding row every scan and drown the operator. Use a stable identifier:key="cert:app.client.fr:/etc/letsencrypt/live/app.client.fr/fullchain.pem". - Don't include free-form messages in
key. Phrasing changes, the key shouldn't. Anything dynamic (the actual days-until-expiry, current usage percent) goes inmessageandmetadata, not inkey.
The unique index does the heavy lifting; the upsert does the right thing as long as the key is stable.
Querying
-- Findings per node, grouped by severity
SELECT node_id, severity, count(*)
FROM "Finding"
WHERE status = 'OPEN'
GROUP BY node_id, severity
ORDER BY node_id;
-- Health bucket for one node
SELECT
CASE
WHEN sum(CASE WHEN severity='CRITICAL' THEN 1 ELSE 0 END) > 0 THEN 'At risk'
WHEN sum(CASE WHEN severity='WARN' THEN 1 ELSE 0 END) > 0 THEN 'Attention'
ELSE 'Healthy'
END
FROM "Finding"
WHERE node_id = 'prod-eu-1' AND status = 'OPEN';
-- "How did this finding get resolved?"
SELECT f.id, f.title, f.resolved_at, j.preset_name, j.status
FROM "Finding" f
LEFT JOIN "Job" j ON j.id = f.resolved_by_job_id
WHERE f.id = 'cmp...';
What this is NOT
- An alerting system. Findings are facts about the present state of a node. Alerts (with rules, channels, dedup windows, escalation policies) are a separate layer — Phase 5 territory.
- A monitoring buffer. Heartbeats and per-minute metrics live in
Heartbeat. Findings only get created when something crosses a threshold OR is structurally wrong (no backup on a DB container). - A graph database.
relatedEntitiesis a flat array ofkind:identifierstrings. The "click around the graph" UX is a Phase 4.5 feature once we have enough scanners to make it interesting.