Skip to main content

Managed services

A managed service is a stateful workload the RunOS control plane provisions and reconciles for you: a database, a cache, object storage, a message broker, an AI model server. You declare what you want; the RunOS API builds it, owns the Kubernetes objects, and keeps them matching your spec.

The model

Each service runs in its own namespace. The control plane provisions it one of three ways depending on the type:

  • An operator plus a custom resource (PostgreSQL via CloudNativePG, MinIO via its tenant operator).
  • A plain Deployment.
  • A Helm release (Langfuse, for example, takes a chart version).

You rarely care which: the verbs are the same across types (add, update, delete, logs, show, status).

Every instance has an OSID of the form type-id (for example postgresql-bcv4l); its Kubernetes namespace name equals the OSID, one namespace per instance. See Core concepts for the full identity model (cluster domain, host routing, wildcard cert).

Delete is guarded. runos services <type> delete is refused while anything still depends on the service (an app or another service that names it in requires). The error is service has dependents and lists them. There is no cascade and no force flag. Detach the dependent first, then delete. -y skips the confirmation prompt.

The catalog

Twelve user-facing types, verified live (CLI manifest v31.14.0). Versions shown are the add defaults; types without a --version flag are pinned by the platform.

TypeCategoryDefault versionNotes
postgresqlDatabase17.6CloudNativePG. Read-write and read-only endpoints. clone-database, adopt-user, grant-database, create-database verbs. Extensions: --vector, --apache-age, --document-db-extension.
mysqlDatabase8.4.5create-database, grant-database, managed users.
valkeyCache9.0.0In-cluster only, ephemeral (no persistent volume). --secured (default on) gates TLS/auth.
minioObject storagepinnedS3 API. Buckets and users (create-bucket, create-user, grant-bucket). --expose-s3on443 for public S3.
harborRegistrypinnedThe cluster's system image registry (--system-instance). Composite: needs --postgres-osid, --valkey-osid, --minio-osid.
kafkaMessagingpinned--enable-ui (default on) ships a web UI for topics, schemas, consumer groups. --storage-mb immutable after create.
rabbitmqMessagingpinned--storage-mb persistent volume (default 5120 MiB).
clickhouseAnalyticspinned--shards, --replicas per shard.
vllmAI serving0.21.0GPU (CUDA): --gpu-count. --model from HuggingFace, S3, or MinIO. KV-cache tiering via --lmcache-mode.
ollamaAI serving0.9.0--gpu-count (0 for CPU-only).
litellmAI gatewaymain-v1.81.12-stableLLM gateway. Master key is sk-litellm-.... Needs --postgres-osid; optional --valkey-osid cache.
langfuseAI observability1.5.31Helm chart. Wires --postgres-osid, --clickhouse-osid, --minio-osid, --valkey-osid.

Other manifest types (cert-manager, traefik, vector, prometheus, grafana, linstor, wireguard, netbird-client, netbird-server) are platform and infrastructure services, part of cluster config rather than app-facing managed services.

Provision a service

Three ways, same result.

Imperative CLI. One command, prints a job:

runos services postgresql add --cid ky3 --name orders-db --version 17.6
runos services valkey add --cid ky3 --name sessions
runos services kafka add --cid ky3 --name events --enable-ui

Declarative IaC. Pull a service to a local YAML, edit it, push it back. The reverse of add:

runos services pull --type postgresql --id bcv4l --cid ky3   # writes runos.service.ky3.bcv4l.yaml
# edit the yaml
runos services sync runos.service.ky3.bcv4l.yaml --dry-run # see the plan
runos services sync runos.service.ky3.bcv4l.yaml # apply

A YAML with no id field provisions a new service on sync (POST); the id is written back on success. A YAML with an id patches the existing one. The schema is derived from the RunOS API's manifest at runtime, so new types and fields flow through without a CLI upgrade (run runos manifest update when the platform is upgraded).

Console. The same operations through the web UI.

List and inspect:

runos services list --cid ky3                 # every type in the cluster
runos services postgresql show bcv4l --cid ky3
runos services dependents postgresql bcv4l --cid ky3

Connect an app (requires)

Apps name the services they need in a requires block in runos.yaml, keyed by an alias:

requires:
db:
type: postgresql
id: bcv4l # link an existing instance
cache:
type: valkey
class: valkey.c0.beff # class shorthand: provision a fresh instance at deploy time

On deploy, the platform resolves each entry, provisions any class-shorthand service, injects connection env vars into your app (DATABASE_URL, REDIS_URL, and so on), and writes a runos.service.<cid>.<sid>.yaml for anything it created so future edits go through sync.

Auto-wiring covers postgresql, valkey, and mysql only. Those three have credential handlers that mint a database or user and hand your app the connection string. Other services you connect by hand: for MinIO, pass its OSID to the consuming service (--minio-osid) or read its credentials and set your own env vars. The tier (class) of a required service is immutable after creation.

Size, version, and configure

Sizing uses a resource class per type, grammar service.tier.size, default best-effort (*.c0.beff: limits only, zero requests, Burstable QoS). See Core concepts for what the tiers and sizes mean. Override --cpu-limit-mc, --memory-limit-mb, or --replicas and the class flips to custom. There is no autoscaling and no HPA.

Versions are pinned at create time from the --version default (where the type has one). Upgrades are an explicit update, not automatic.

Config sets apply to postgresql and vllm only. Use --config-type config_set plus --config-set-id to inherit an account-level config set (its current version is pinned at create), or --config-type custom (the default) to edit configs directly. PostgreSQL also has get-advanced-configs / set-advanced-configs verbs.

Storage and durability

Two backends.

openebs-local is the default, a node-local volume, no cross-node replication. Local volumes cannot expand in place: a storageMb change on a local-storage service is refused (recreate larger, or move to a distributed tier).

LINSTOR is the published opt-in distributed backend (DRBD-replicated). It is install-gated: distributed storage requires LINSTOR installed and set as the cluster system service first. Distributed volumes can resize in place. The storage backend is fixed at create.

Note the per-type storage rules: kafka, vllm, and ollama --storage-mb is immutable after creation. PostgreSQL --storage-mb is increase-only via update, and only on a backend that can grow.

Defaults and hardening

Managed services ship with development defaults and keep the production options one setting away. Choose these deliberately when you provision.

  • HA is opt-in. A service starts at a single replica (no failover at one replica). Turn on high availability with a multi-replica tier or --replicas: for PostgreSQL that runs a primary plus streaming standbys, with anti-affinity and split read-write/read-only endpoints. You need at least as many nodes as replicas.
  • Backups are opt-in, not automatic. A new database is not backed up until you turn it on, there is no silent daily job. Give PostgreSQL an S3-style destination and configure-backup wires up WAL archiving, a scheduled backup with a retention window, one-shot backups, and restore (a fresh instance can even bootstrap from a backup).
  • Durable storage is opt-in. The default is fast node-local storage; pick a distributed tier for LINSTOR-replicated volumes that survive a node loss.
  • Valkey is a cache, not a store. It is ephemeral by default (no persistent volume) and in-cluster only. TLS and auth are on by default (--secured); turn them off only if you have a reason to.
  • MinIO defaults to a single drive (volumesPerServer: 1). Add drives to get erasure coding.

Need a type that is not in the catalog? Contact support, or self-deploy it as a normal RunOS app with runos deploy.