Operations
This page describes how to run the system safely in production terms: how builds become releases, how rollouts are controlled, and how recovery works when something goes wrong.
What “Operations” owns
Operations ensures that: - Only validated outputs become candidates for release (“no compile on fail”). - Only verified runtime packs activate (“fail closed”). - Rollouts are controlled (channels/cohorts/pinning) and fully auditable. - Rollbacks are deterministic and prevent partial activation.
Golden path (build → release → runtime)
```mermaid
flowchart LR
A[Build (Orgo)
gated pipeline] --> B[Eligible output
passed validation]
B --> C[Release (channels/cohorts)
promotion + monitoring]
C --> D[Konnaxion
verify -> activate]
D --> E[Runtime (Malkuth)
serve offline]
E --> F[Rollback if needed
deterministic target]
````
Key operational concepts
- Build: a gated run that produces a candidate Runtime Pack and evidence of what happened.
- Release: controlled rollout of a Runtime Pack into an environment via channels/cohorts.
- Channel: a logical track (e.g., canary/stable/lts) with its own promotion and rollback behavior.
- Cohort: a controlled subset of traffic/tenants/devices for progressive rollout.
- Pin: holding a channel/cohort to a specific pack (freeze promotion).
- Last-known-good (LKG): the deterministic fallback target used for rollback when not explicitly pinned.
Build operations
What happens in a build
- Inputs are ingested as immutable snapshots with provenance.
- Claim extraction/resolution happens upstream of truth.
- Validation is a hard gate.
- If validation passes, compilation produces canonical outputs + a Runtime Pack for distribution.
What operators care about
- A build is eligible only if all gates pass.
- Every build produces operational evidence (records) so it can be audited and reproduced.
Release operations
Release lifecycle
- Create release from an eligible build.
- Select channel + cohorts for progressive rollout.
- Verify-before-activate: integrity/compatibility checks must pass (fail closed).
- Monitor health signals during rollout.
- Promote or pin once confidence is sufficient.
- Record outcomes and decisions (promotion/pin/rollback) in auditable release records.
Operator checklist for a safe rollout
- Start with canary cohort(s)
- Require verification success before activation
- Promote in steps, not all at once
- Pin if signals are uncertain
- Roll back quickly if verification or health regresses
Rollback operations
Rollback is the default recovery mechanism when a release is unsafe.
Rollback triggers (examples)
- Pack verification failures
- Runtime health regressions during rollout
- Detected incompatibility with an environment/channel
- Policy violation or unexpected determinism drift
Rollback invariants
- Target selection is deterministic (pinned target or LKG).
- Activation is atomic (avoid partial activation).
- Downgrade prevention policies may block unsafe rollback targets.
- Evidence is preserved (rollback is recorded; nothing is silently overwritten).
Observability
Operations relies on two categories of signals:
1) System health signals (for go/no-go decisions)
- Verification pass/fail rates
- Activation success rates and latency
- Cohort/channel error rates and performance regressions
- Pack fetch/cache integrity signals (offline readiness)
2) Evidence and audit signals (for explainability)
- Build records: what inputs/policies were used and which gates passed/failed
- Release records: which pack was promoted/pinned/rolled back, where, and why
Incident response (operator posture)
When an incident is declared:
- Stop the blast radius: pause promotion and/or pin the channel.
- Prefer rollback over patching in place.
- Preserve evidence (records and traces) for root cause.
- Recover deterministically (pinned/LKG targets; atomic activation).
- Document what happened and what changed (post-incident record).