Skip to main content

System Architecture

RunOS is designed to abstract away Kubernetes complexity while leveraging its reliability and power. This document explains how the system works, how components communicate, and what makes RunOS different from traditional Kubernetes deployments.

What RunOS Provides

RunOS gives you all the benefits of Kubernetes without the operational burden:

  • One-click service deployments - PostgreSQL, Redis, Kafka, and 20+ services deploy instantly
  • Git-based application deployment - Push code, RunOS handles builds, deployments, domains, and SSL
  • Infrastructure flexibility - Run on any cloud provider, bare-metal, or on-premises
  • Automatic management - Kubernetes clusters created and managed behind the scenes
  • Service intelligence - Applications automatically discover and connect to services

Core Components

Console (Web Application)

Your primary interface for managing everything in RunOS.

What you do here:

  • Add and manage servers
  • Deploy applications from Git repositories
  • Install services like databases and message queues
  • Monitor cluster health and performance
  • View logs and metrics
  • Configure domains and SSL certificates

Think of the Console as mission control - where you make decisions and see results.

Node Agent (On Your Servers)

Lightweight daemon running on each server in your cluster.

Responsibilities:

  • Prepares servers during initial setup
  • Maintains secure connections to RunOS platform
  • Manages VPN connectivity between servers
  • Executes deployment commands
  • Reports health status and metrics

When you add a server through the Console, the Node Agent handles all technical setup automatically.

Cluster Agent (In Kubernetes)

Service running inside your cluster in the runos namespace.

Responsibilities:

  • Automatically provisions SSL/TLS certificates
  • Renews certificates before expiration
  • Maintains secure communication with RunOS platform
  • Handles cluster-level administrative tasks

Works quietly in the background to keep your cluster secure and operational.

RunOS Backend

Central communication hub connecting everything together.

What it does:

  • Routes commands from Console to Node Agents
  • Collects status updates and metrics from servers
  • Manages secure mTLS communication channels
  • Handles authentication and authorization

You don't interact with it directly - it's the invisible messenger making everything work seamlessly.

Templates Service

Provides configurations for all deployable services.

What it provides:

  • Kubernetes configuration templates for supported services
  • Intelligent defaults for service deployments
  • Tiered configuration options (lightweight, HA, enterprise)
  • Best-practice configurations

When you deploy PostgreSQL or any service, Templates provides the battle-tested configuration.

How Components Communicate

Secure Communication

RunOS uses encrypted communication with certificate-based authentication throughout the platform.

Initial Registration

  • Used when new servers join your cluster
  • HTTPS/TLS encrypted communication
  • Token-based authentication
  • One-time setup process

Ongoing Operations

  • Mutual TLS (mTLS) authentication
  • Both client and server verify certificates
  • All Console ↔ Backend ↔ Agent communication
  • Prevents man-in-the-middle attacks

VPN Layer

  • WireGuard encryption for all inter-server traffic
  • Modern cryptography (ChaCha20, Poly1305, Curve25519)
  • Faster and more secure than IPsec or OpenVPN
  • Perfect forward secrecy

Communication Flows

When You Deploy an Application:

1. You (Web Browser)
↓ HTTPS
Console Frontend
↓ REST API
Console Backend
↓ Authentication check
Validated request
↓ mTLS encrypted connection
RunOS Backend
↓ Routes to appropriate server(s)
Node Agent(s)
↓ Executes kubectl commands
Kubernetes creates pods
↓ Status flows back
Console shows success

Timeline: Typically 5-15 seconds from clicking "Deploy" to seeing your app running.

Server Health Monitoring:

Node Agent (every 5 seconds)
↓ Heartbeat via mTLS
RunOS Backend
↓ Updates internal state
Stores latest status
↓ When Console requests
Returns current status
↓ REST API response
Console displays real-time health

SSL Certificate Issuance:

cert-manager (in cluster)
↓ Requests certificate
Let's Encrypt
↓ DNS-01 challenge
Cluster Agent (webhook)
↓ mTLS connection
RunOS Backend
↓ Updates DNS
Let's Encrypt verifies
↓ Issues certificate
Available to your apps

Timeline: Usually 1-2 minutes for initial issuance, renewals happen automatically in the background.

Server-to-Server Communication

All servers in your cluster are connected via WireGuard VPN creating an encrypted mesh network:

Server A (172.24.1.10)  ←→  WireGuard  ←→  Server B (172.24.1.20)
↓ ↓
Pod Network Pod Network
(172.25.1.0/24) (172.25.2.0/24)

Why VPN:

  • Works behind NAT without public IPs
  • Encrypts all inter-server traffic
  • Simplifies network configuration
  • Provides stable IP addresses for nodes

Network Architecture:

  • wg0 (172.24.0.0/16) - Kubernetes internal traffic
  • wg1 (172.24.200.0/21) - User access network
  • Pod network (172.25.0.0/16) - Container IPs
  • Service network (10.96.0.0/12) - Kubernetes service IPs

Data Flow Examples

Deploying a Service (PostgreSQL)

  1. You click "Deploy PostgreSQL" in Console
  2. Console validates cluster has sufficient resources
  3. Templates provides PostgreSQL Kubernetes manifests
  4. RunOS backend sends deployment instructions via mTLS to Node Agents
  5. Node Agents execute Kubernetes deployment
  6. Kubernetes starts PostgreSQL pods
  7. Cluster Agent provisions SSL certificate
  8. Status flows back: Agents → Backend → Console
  9. You see PostgreSQL running with connection details

All in seconds, fully configured and secure.

Deploying an Application

  1. You point to your Git repository
  2. Console builds container image (GitHub Actions or in-cluster BuildKit)
  3. Image pushed to local Harbor registry
  4. Console sends deployment command via RunOS backend
  5. Node Agents execute Kubernetes deployment
  6. Kubernetes pulls image from Harbor and starts pods
  7. Cluster Agent provisions SSL certificate for your domain
  8. Traefik ingress routes traffic to your application
  9. Console displays application URL and status

Service Discovery

  1. You deploy an application that needs PostgreSQL
  2. During deployment, RunOS lists available PostgreSQL instances
  3. You select a compatible instance
  4. RunOS injects connection credentials as environment variables
  5. Your application automatically connects using those variables
  6. Kubernetes internal DNS routes traffic between services

Understanding Key Concepts

Kubernetes (Hidden But Present)

You don't need to know Kubernetes to use RunOS, but understanding a few concepts helps:

  • Pods - Running instances of your applications or services
  • Services - Network endpoints that route traffic to pods
  • Ingress - Routes external traffic to your services
  • Persistent Volumes - Storage that persists even if pods restart
  • Namespaces - Isolated environments for resources

The Console shows these in user-friendly terms, but knowing the underlying primitives helps with advanced troubleshooting.

OSID (Open Service Identifier)

Every service and application in RunOS has a unique identifier in the format: service-name-xxxxx

Examples:

  • mysql-d6ekr - A MySQL database instance
  • postgres-k9m3w - A PostgreSQL database
  • myapp-t7r4s - A custom application

Why it matters: Each OSID is also a Kubernetes namespace containing all resources for that service/application. When you need to troubleshoot at the Kubernetes level:

# View all resources for a service
kubectl get all -n mysql-d6ekr

# Check pods for your application
kubectl get pods -n myapp-t7r4s

# View logs
kubectl logs -n postgres-k9m3w <pod-name>

This organization keeps services isolated and makes troubleshooting straightforward.

Your Infrastructure, Your Control

RunOS runs on servers you provide and control:

  • You choose where servers are located (cloud, on-premises, hybrid)
  • You maintain physical or virtual machine access
  • You can access Kubernetes directly if needed (kubectl works)
  • All your data stays on your infrastructure

RunOS manages the Kubernetes control plane and workloads, but you retain root access to your servers.

Network Architecture

IP Addressing

Each server receives multiple IP addresses:

  1. Physical network IP - Your server's actual network interface
  2. wg0 VPN IP - Kubernetes internal (172.24.X.X)
  3. wg1 VPN IP - User access (172.24.200.X)
  4. Pod CIDR - Range for pods on this node (172.25.X.X/24)

DNS Resolution

Application queries database.default.svc.cluster.local

systemd-resolved

dnsmasq (on wg0 VPN IP)

CoreDNS (Kubernetes DNS)

Returns ClusterIP (10.96.X.X)

kube-proxy routes to pod

PostgreSQL Pod (172.25.X.X)

External domains resolve through dnsmasq → Cloudflare/Google DNS.

Traffic Routing

Pod-to-Pod (Same Node):

Pod A → Container bridge → Pod B

Pod-to-Pod (Different Nodes):

Pod A → Cilium CNI → wg0 VPN → Encrypted tunnel → Target Node → Pod B

All inter-node traffic flows through encrypted VPN tunnels automatically.

Connection Reliability

Automatic Reconnection

If connections are interrupted, agents automatically reconnect:

  1. Connection lost detected
  2. Wait with exponential backoff (starts at 1 second)
  3. Attempt reconnection
  4. If successful, resume normal operations
  5. If failed, increase wait time (max 60 seconds)

Why exponential backoff:

  • Prevents connection storms
  • Gives network time to recover
  • Reduces load on control plane
  • Handles temporary outages gracefully

Health Monitoring

  • Heartbeats sent every 5 seconds
  • Healthy: Heartbeats received within 10 seconds
  • Degraded: Heartbeats delayed 10-30 seconds
  • Offline: No heartbeat for 30+ seconds

If network returns, agents reconnect automatically and resume normal operation.

Firewall Requirements

Required Outbound Access

Your servers need outbound access to:

ServiceProtocolPurpose
runos.comHTTPS (443)Platform communication
get.runos.comHTTPS (443)Component downloads
Other cluster nodesUDP (51820/51821)VPN connectivity

Important: No inbound ports required - all connections initiated outbound from your servers.

Behind NAT

RunOS works behind NAT without special configuration:

  • All connections are outbound-initiated
  • NAT allows return traffic automatically
  • WireGuard uses UDP hole-punching for peer-to-peer
  • Persistent keepalive maintains NAT state

Supported scenarios:

  • Home networks behind residential NAT
  • Corporate networks with firewall
  • Cloud VPCs with private subnets
  • Hybrid setups across multiple networks

Security Architecture

Multiple Layers of Protection

Certificate-based authentication:

  • All agents present client certificates
  • Certificates validated before accepting commands
  • Certificates can be revoked if compromised

Encrypted communication:

  • TLS/mTLS for all platform communication
  • WireGuard VPN for inter-server traffic
  • No plaintext credentials stored

Minimal attack surface:

  • No inbound network ports opened
  • All connections initiated outbound
  • No remote shell access
  • Limited to authorized operations only

Secrets Management

  • All secrets encrypted at rest by default
  • Encryption handled transparently by Kubernetes
  • AES-CBC encryption with random keys
  • Access controlled by namespace
  • Never exposed in logs or UI

What Makes RunOS Different

Traditional Kubernetes:

  • Manual cluster setup and configuration
  • Complex networking and certificate management
  • Manual service deployments with YAML files
  • Requires deep Kubernetes expertise

RunOS:

  • Automatic cluster setup and configuration
  • Automatic networking and certificate management
  • One-click service deployments
  • Kubernetes expertise optional

You get all the benefits of Kubernetes (reliability, scalability, ecosystem) without the operational complexity.

Performance Characteristics

Network Overhead

Node Agent communication:

  • ~1 MB/hour per node for heartbeats and status
  • Command traffic variable, typically less than 10 KB each

VPN overhead:

  • WireGuard adds ~60 bytes per packet
  • Encryption/decryption very fast (negligible CPU)
  • Typical overhead: 1-5%

Network requirements:

  • Minimum: 128 Kbps per node
  • Recommended: 1 Mbps or higher per node
  • Latency tolerance: less than 500ms acceptable

Resource Usage

Per Server:

  • Node Agent: ~50-100MB RAM, less than 1% CPU
  • Cluster Agent (one per cluster): ~128MB RAM, ~100m CPU
  • VPN: Minimal overhead
  • Total platform overhead: less than 200MB RAM per server

This leaves the vast majority of server resources available for your applications.