Findings & Recommendations

The data model behind the autopilot loop. Read docs/AUTOPILOT.md first for the why; this doc covers the how.

Data model

Recommendation 1 ──n── Finding
                       │
                       └── nodeId → Node
                       └── resolvedByUserId → User?  (manual resolve)
                       └── resolvedByJobId        ?  (auto-resolve via job)

`Finding`

A discrete risk detected by a scanner. Every Finding row answers "what's wrong, on which node, and how do I know it's the same thing I saw last scan?"

model Finding {
  id               String   @id @default(cuid())
  nodeId           String                       // FK Node.nodeId
  type             String                       // "tls.expiring", "postgres.no-backup", ...
  key              String                       // discriminator inside (nodeId, type)
  severity         FindingSeverity              // INFO | WARN | CRITICAL
  title            String                       // one-line headline
  message          String   @db.Text             // operator-facing explanation
  recommendationId String?                       // nullable: not every finding has a fix yet
  relatedEntities  String   @default("[]")       // JSON array of "kind:id" strings
  metadata         String   @default("{}")       // scanner output, curate to small JSON
  status           FindingStatus                 // OPEN | SNOOZED | RESOLVED
  detectedAt       DateTime @default(now())
  lastSeenAt       DateTime @default(now())
  resolvedAt       DateTime?
  resolvedByJobId  String?
  resolvedByUserId String?
  snoozedUntil     DateTime?

  @@unique([nodeId, type, key])               // dedup contract
  @@index([nodeId, status, severity])
  @@index([recommendationId])
}

The (nodeId, type, key) unique constraint is the dedup contract. When the same scanner re-sees the same problem, it upserts: the row's lastSeenAt advances; the id, detectedAt, and any user actions (snooze, manual resolve) survive.

`Recommendation`

A vetted mapping from a finding type to a preset to run. Curated server-side, seeded into the DB; not user-editable in MVP.

model Recommendation {
  id            String              @id   // "tls.caddy-renew"
  title         String
  description   String              @db.Text
  presetName    String                    // FK by name into the YAML catalogue
  riskLevel     RecommendationRisk        // SAFE | CONFIRM | DANGEROUS | MANUAL
  findingTypes  String              @default("[]")  // JSON array of finding `type`s
  createdAt     DateTime            @default(now())
  updatedAt     DateTime            @updatedAt
  findings      Finding[]
}

One recommendation can serve multiple finding types — that's why findingTypes is a JSON array, not a single field. It's also why we don't need a join table: the row count stays in the dozens, the JSON parsing is cheap on read.

Each new recommendation lands in apps/api/prisma/seed-recommendations.ts AND ships with a matching preset YAML. The seed script is idempotent — re-running it upserts on the stable string id.

Severity model

Severity	What it means	Health bucket impact
`INFO`	FYI; no immediate action needed	Doesn't move the node
`WARN`	Should be addressed soon; not on fire	Bumps the node to Attention
`CRITICAL`	On fire OR will be very soon (cert expiring < 7d, no DB backup)	Bumps the node to At risk

The dashboard computes per-node health as: CRITICAL > 0 ⇒ At risk else WARN > 0 ⇒ Attention else Healthy. No numeric score — see AUTOPILOT.md for the why.

Lifecycle

                     ┌──────────┐
                     │  (seed)  │  scanner detects something for the first time
                     └────┬─────┘
                          │
                          ▼
                     ┌──────────┐
            ┌────────│   OPEN   │────────┐
            │        └────┬─────┘        │
            │             │              │ next scan: still there?
            │             │              │ → upsert, lastSeenAt advances, stays OPEN
   user clicks       agent re-scan       │ → still there? AND scan misses for 3 ticks? → RESOLVED
   Snooze            after job           │ → not there? → RESOLVED, resolvedByJobId set
   for N hours       success
            │             │
            ▼             ▼
       ┌─────────┐   ┌──────────┐
       │ SNOOZED │   │ RESOLVED │  terminal — but a future re-detect creates a NEW row
       └────┬────┘   └──────────┘  (the old one stays for history)
            │
            │ snoozedUntil elapses
            ▼
       ┌─────────┐
       │  OPEN   │
       └─────────┘

A few subtleties:

Auto-resolve via job: when a Job finishes SUCCEEDED, the API looks at findings whose recommendationId.presetName == job.presetName AND nodeId == job.nodeId AND status == OPEN, and marks them RESOLVED with resolvedByJobId = job.id. The next scan that confirms the fix worked just leaves them resolved; the next scan that re-detects the problem creates a NEW finding (different id, same dedup tuple would conflict — so we generate a new key, e.g. with a date suffix, to make the new occurrence visible).
Snooze with a deadline: snoozedUntil is a timestamp. The UI hides snoozed findings by default; the API treats OPEN and SNOOZED-but- past-the-deadline identically when computing the health bucket.
History stays: RESOLVED rows are kept forever (the table is small). Use them to answer "did we already fix this on this node before?".

Adding a new finding type

A scanner emits findings of a type. To wire a new type end-to-end:

Pick a stable type string. Format: <scanner>.<short-name>, kebab-case, dot-separated. Examples: tls.expiring, ssh.password-auth, postgres.no-backup, disk.usage-high.
Define the key. It's the discriminator that says "this finding is distinct from another finding of the same type on the same node". For TLS expiry, the key is the cert subject + path. For "disk usage high", it's the mount point.
Pick a severity. Be conservative — INFO is the default; promote to WARN/CRITICAL only when the operator should drop what they're doing.
Write the scanner (Go, in agent/hyprnode/internal/scanner/<name>/). It POSTs [{type, key, severity, title, message, metadata, relatedEntities}] to /api/findings after every tick.
(Optional) Add a recommendation. Add a row to seed-recommendations.ts and write the matching preset. A finding type without a recommendation is acceptable — it shows in the UI without an Apply button.

API surface

Action	Endpoint	Auth
Agent: upsert batch	`POST /api/findings`	node token
Agent: docker inventory (cross-ref)	`POST /api/nodes/inventory`	node token
List for the operator	`GET /api/findings[?nodeId&status]`	user JWT
Detail	`GET /api/findings/:id`	user JWT
Snooze	`POST /api/findings/:id/snooze`	user JWT
Manual resolve	`POST /api/findings/:id/resolve`	user JWT
Health rollup per node	`GET /api/findings/health/summary`	user JWT

The full reference lives in docs/API.md.

Built-in finding types

The catalogue as of Phase 7. New types land here when the matching scanner ships.

Type	Severity	Scanner	Key shape	Recommendation
`tls.expiring`	WARN <30d, CRITICAL <7d	`scanner/tls.go` walks LE + `/etc/ssl/certs/hyprbox/*`	`cert:<subject>:<path>`	`tls.caddy-renew` (SAFE)
`tls.unreadable`	INFO	`scanner/tls.go`	`<path>`	none — operator visibility only
`ssh.password-auth-enabled`	CRITICAL	`scanner/ssh.go` parses `sshd_config` (Include + Match + first-write-wins)	`sshd:<config-path>`	`ssh.disable-password-auth` (DANGEROUS)
`ssh.unreadable`	INFO	`scanner/ssh.go`	`sshd:<path>`	none
`disk.usage-high`	WARN ≥80%, CRITICAL ≥90%	`scanner/disk.go` via gopsutil, skips pseudo FS	`mount:<path>`	`disk.docker-prune` (CONFIRM) — only meaningful on docker hosts
`postgres.no-backup`	CRITICAL	API cross-ref on `POST /api/nodes/inventory` × `BackupPolicy`	`container:<name>`	`postgres.add-backup` (CONFIRM) → `hypervault-restic`
`audit.kernel-updates-pending`	WARN	`scanner/audit.go` parses `apt list --upgradable` for `-security` source	`audit:kernel-updates`	`audit.apply-security-updates` (CONFIRM) → `apt-security-upgrade`
`audit.root-password-set`	WARN	`/etc/shadow` line for root has a real hash starting with `$`	`audit:root-password`	none — MANUAL (`passwd -l root`)
`audit.sudo-nopasswd`	WARN	greps `/etc/sudoers{,.d/*}` for `NOPASSWD:`	`audit:sudo-nopasswd`	none — MANUAL (per-rule review)
`audit.world-writable-etc`	WARN	`find /etc -xdev -type f -perm -o+w` returns ≥1	`audit:world-writable-etc`	none — MANUAL (per-file `chmod o-w`)
`audit.ufw-inactive`	CRITICAL on public-IP host, WARN otherwise	`ufw status verbose` parsed for `Status: active` AND `Default: deny (incoming)`	`audit:ufw`	`audit.server-light-baseline` (CONFIRM) → `server-light`
`audit.fail2ban-down`	WARN	`systemctl is-active fail2ban` ≠ `active`	`audit:fail2ban`	`audit.server-light-baseline` (CONFIRM)
`audit.no-automatic-updates`	INFO	`/etc/apt/apt.conf.d/20auto-upgrades` missing or both directives ≠ `"1"`	`audit:auto-upgrades`	`audit.server-light-baseline` (CONFIRM)
`audit.no-disk-encryption`	INFO	`lsblk -no TYPE` has no row equal to `crypt`	`audit:luks`	none — MANUAL (LUKS can't be added live)
`audit.no-auditd`	INFO	`systemctl is-active auditd` ≠ `active`	`audit:auditd`	none — INFO only

Reconciliation rules per type

Not all findings are emit-only. Some get auto-resolved by the API when the underlying state changes, without waiting for a Job to run:

postgres.no-backup — the POST /api/nodes/inventory endpoint walks existing OPEN findings on the node. If the container in the key is no longer present in the live inventory, OR a BackupPolicy now targets the node, the finding flips to RESOLVED on that tick. The next inventory with the container back AND no policy re-emits.
All other types — resolved either by a successful Job whose preset matches the recommendation, OR by a manual POST /:id/resolve. Re-emit on the next scan if the scanner sees the problem again.

Writing dedup-safe scanners

Two rules:

Never invent a new key for the same underlying problem. A scanner that emits key="cert-app.client.fr-2026-08-25" (with a date) will produce a new finding row every scan and drown the operator. Use a stable identifier: key="cert:app.client.fr:/etc/letsencrypt/live/app.client.fr/fullchain.pem".
Don't include free-form messages in key. Phrasing changes, the key shouldn't. Anything dynamic (the actual days-until-expiry, current usage percent) goes in message and metadata, not in key.

The unique index does the heavy lifting; the upsert does the right thing as long as the key is stable.

Querying

-- Findings per node, grouped by severity
SELECT node_id, severity, count(*)
FROM "Finding"
WHERE status = 'OPEN'
GROUP BY node_id, severity
ORDER BY node_id;

-- Health bucket for one node
SELECT
  CASE
    WHEN sum(CASE WHEN severity='CRITICAL' THEN 1 ELSE 0 END) > 0 THEN 'At risk'
    WHEN sum(CASE WHEN severity='WARN'     THEN 1 ELSE 0 END) > 0 THEN 'Attention'
    ELSE 'Healthy'
  END
FROM "Finding"
WHERE node_id = 'prod-eu-1' AND status = 'OPEN';

-- "How did this finding get resolved?"
SELECT f.id, f.title, f.resolved_at, j.preset_name, j.status
FROM "Finding" f
LEFT JOIN "Job" j ON j.id = f.resolved_by_job_id
WHERE f.id = 'cmp...';

What this is NOT

An alerting system. Findings are facts about the present state of a node. Alerts (with rules, channels, dedup windows, escalation policies) are a separate layer — Phase 5 territory.
A monitoring buffer. Heartbeats and per-minute metrics live in Heartbeat. Findings only get created when something crosses a threshold OR is structurally wrong (no backup on a DB container).
A graph database. relatedEntities is a flat array of kind:identifier strings. The "click around the graph" UX is a Phase 4.5 feature once we have enough scanners to make it interesting.

Findings & Recommendations

#Data model

#Finding

#Recommendation

#Severity model

#Lifecycle

#Adding a new finding type

#API surface

#Built-in finding types

#Reconciliation rules per type

#Writing dedup-safe scanners

#Querying

#What this is NOT