Skip to content

Chokmah (Inputs)

Chokmah is the system’s ingest + provenance boundary. It captures raw external inputs and turns them into immutable, content-addressed snapshots so downstream stages can be reproduced and audited.

Chokmah does not decide truth. It guarantees only that “what we saw” is captured, traceable, and replayable.


Purpose

  • Provide a safe, repeatable ingest boundary for raw data (files, feeds, APIs, user submissions).
  • Produce stable snapshot references that the rest of the pipeline can pin to a build.

Responsibilities

Chokmah must:

  • Ingest raw materials from configured sources.
  • Produce immutable Input Snapshots stored in a content-addressed store.
  • Capture and persist provenance per snapshot (source identity, retrieval time, auth/context, routing tags, policy tags).
  • Enforce confidentiality and access control (including encryption at rest when required).
  • Provide idempotent ingestion (same bytes → same snapshot reference).
  • Emit a deterministic, machine-readable input set reference for downstream builds.
  • Emit deterministic ingestion results with stable error codes.

Chokmah may:

  • Normalize transport/container formats without changing meaning (e.g., decode/decompress/charset normalization) only if:
  • the transform is deterministic, and
  • the transform is recorded explicitly in the snapshot manifest (tool id/version + steps).

Chokmah must not:

  • Interpret, enrich, or “fix” content in a way that changes meaning without recording it as a derived artifact.
  • Generate or alter canonical truth artifacts.

Interfaces

Inputs to Chokmah

Ingest Request (from Orgo or an ingestion orchestrator), including:

  • Source descriptor (URI / connector type / credentials reference)
  • Expected content type and size (if known)
  • Confidentiality classification / handling constraints
  • Mandate/policy tags
  • Optional retention/quarantine directives

Outputs from Chokmah

  • Input Snapshot Set: one or more content-addressed snapshot references
  • Snapshot Manifest:
  • deterministic mapping of snapshot ref → provenance + acquisition metadata
  • ingestion policy version/ref applied
  • transformation steps (if any) with deterministic tooling identifiers
  • integrity metadata (hashes/checksums)
  • Ingest Receipt:
  • snapshot refs created/confirmed
  • errors/partial results (if any)
  • diagnostics pointers
  • Optional Quarantine Report (when content is blocked/sanitized/quarantined per policy)

Snapshot identity model

  • Snapshot identity is content-addressed: same payload bytes → same snapshot reference.
  • Metadata changes do not change the payload reference; metadata lives in the manifest/records.
  • If the source is mutable (e.g., “latest.json”), Chokmah still stores the retrieved bytes as an immutable snapshot and records retrieval context.

Invariants

1) Immutability: once issued, snapshot bytes never change.
2) Content addressing: refs derive from content (and any explicitly defined deterministic packaging rules).
3) Complete provenance: every snapshot records where/when/how it was fetched, under what policy/authority context, and any transformations applied.
4) Confidentiality enforcement: restricted snapshots are encrypted at rest and access-controlled; downstream should receive only refs unless explicitly authorized to fetch bytes.
5) Idempotent ingestion: re-ingesting identical bytes returns the same ref; retries don’t create duplicates.
6) Build reproducibility: Orgo can bind a build to a stable snapshot set; downstream must not depend on live sources.
7) No hidden enrichment: ingest captures and packages input material; it does not introduce new facts.


Error handling (fail closed)

Chokmah fails closed when it cannot guarantee correctness. Common failures include:

  • Source unreachable / authentication failure
  • Partial download or truncated payload
  • Hash mismatch during streaming verification
  • Policy violation (disallowed source / classification mismatch)
  • Quarantine triggered (malware, sensitive data, unsafe content)
  • Unsupported format / decoding failure
  • Storage failure (cannot commit snapshot immutably)
  • Malformed content violating declared type constraints (record as ingest error; do not coerce)

Errors should be structured and stable:

  • code (stable)
  • message (human)
  • diagnostics_ref (pointer)

Observability

Chokmah should emit:

Metrics

  • ingest requests, successes/failures
  • bytes ingested, throughput
  • latency per connector/source type
  • retry counts and idempotency hits
  • rejection/quarantine rates and reasons

Structured logs

  • request id, source descriptor hash, snapshot refs
  • policy/classification decisions
  • failure codes and diagnostics pointers

Audit events

  • snapshot created/confirmed
  • access grants/denials
  • retention/expiration actions

Security and trust boundary notes

  • Chokmah touches untrusted external data; treat all inputs as untrusted until committed immutably.
  • Connector credentials are handled via a secure secret mechanism (never embedded in artifacts).
  • Confidentiality policy is enforced consistently with mandate and operational policy.

  • Components-Orgo.md
  • Lifecycle.md
  • Operations-Builds.md
  • Artifacts-Operational.md