Core concepts
This is the model you need once you are past quickstart. Other pages link back here for the details.
Desired state, live state, reconciliation
Two states, kept apart on purpose.
- Desired state is your intent: what you asked RunOS to run. It lives in the control plane, separate from your cluster.
- Live state is the running Kubernetes objects on your cluster right now.
The RunOS API is the control plane. It reconciles live state toward desired state, continuously. You change desired state; RunOS makes the cluster match.
Reads are fast and synchronous. Slow writes (create, update, or delete a service, or a deploy) run as a background job and return a jobId. Watch it with runos follow <jobId> or runos jobs show <jobId>.
"Manifest" means three things
The word is overloaded. Keep them straight.
- The CLI command catalog.
runos manifest updatedownloads the latest command and endpoint definitions so your CLI knows what verbs exist.runos manifest listandrunos manifest showinspect it. - Your IaC file,
runos.yaml. This is the manifest in the infrastructure-as-code sense: it describes one app or service.runos apps diffandrunos apps syncboth read it. - The rendered Kubernetes manifest RunOS applies to the cluster. You normally never see it. You can patch it with files under
overrides/.
Infrastructure as code (pull, diff, sync)
Three verbs round-trip config between the cluster and local files.
runos pulldownloads a running app's config intorunos.<cid>.<id>/(therunos.yaml,.env, secret files, and overrides).runos apps diffcompares your local files against server state and writes nothing. Exit codes are CI-gate friendly:0= clean,2= drift you should reconcile,1= the diff itself errored.runos apps syncpushes local files back to the cluster. It plans first (prints what would change), then applies.
So a CI pipeline can gate on runos apps diff (fail on exit 2) and apply with runos apps sync.
Identity (OSID, namespaces, cluster domain, wildcard cert)
Every instance has an OSID: <type>-<id>, where <id> is a 5-char suffix. Live examples: postgresql-bcv4l, cert-manager-h3xwl, traefik-i7r7v.
- The Kubernetes namespace name equals the OSID. One namespace per instance.
- The cluster domain is
<cid>.<aid>.<root>.cidis 3 to 16 lowercase alphanumeric chars (live:ky3);aidis your account id (rjwrn);rootis fixed per cluster at provision (for exampleexample.com). So clusterky3's domain isky3.rjwrn.<root>. - An instance host is
<osid>.<clusterDomain>. An exposed port gets its own auto-route,<osid>-<port>.<clusterDomain>. Example:web-3f9k2-8080.ky3.rjwrn.<root>. - One wildcard cert (
*.<clusterDomain>) covers every host on the cluster. There is no per-service certificate.
Sizing (resource requirement classes)
A resource requirement class sets CPU, memory, replicas, and storage from one id. The grammar is service.tier.size. Live examples: postgresql.c0.beff, traefik.c0.beff; apps use app.sl1.beff.
- The tier encodes replica count and storage type.
c0is one replica on node-local storage.c1is network-replicated (distributed) storage.c2is two replicas on local storage. - The size encodes resources:
beff(best-effort),small,medium,large. - The default is best-effort (
beff): CPU and memory limits only, zero requests. In Kubernetes terms that is Burstable QoS. There is no autoscaling (no HPA). - Setting
cpu,memory, orreplicasyourself flips the class tocustom.
Config sets
A config set is account-scoped, versioned, shared advanced config you can reuse across instances (for example shared_buffers=256MB).
- Immutable: an edit creates the next version (
v1,v2,v3).setIdis the stable lineage handle;isLatestmarks the current version. - Consumed by
postgresqlandvllm. - Manage them with
runos account config-sets add|list|show|update.
The control loop (agents dial out, heartbeat = readiness)
Every node runs a node agent; every cluster runs a cluster agent. Each one dials out to the control plane and holds a single long-lived outbound connection open.
The control plane pushes instructions down that stream. It never polls and never connects in. That is why a node behind NAT works with no inbound ports.
Each agent sends a heartbeat on that connection. A fresh heartbeat means the node is ready. If the heartbeat goes stale (none for roughly 30 seconds) the node drops out of ready.
Security (trust tiers and credentials)
The control plane uses two trust tiers.
- Bootstrap. A node registers with a short-lived registration token over one-way TLS, and trades it for its own mTLS client certificate.
- Steady state. Every agent-to-control-plane stream is mutual TLS (mTLS). The node or cluster agent authenticates with its own certificate.
Three credential types you use directly.
- Web session token. Your Console login session.
- Personal access token (PAT). Account-scoped and expiring. Format
runos_pat_<keyId>.<secret>, sent by the CLI asAuthorization: Bearer runos_pat_.... TheRUNOS_API_KEYenv var overrides the stored key. - Notify API keys. Account-scoped keys for the Notify service.