Skip to content

Tutorial: deploying with PostgreSQL from zero

End-to-end deployment of kneo-serv against PostgreSQL using the bundled Docker Compose stack: rendering env files, starting the service, verifying readiness, and running smoke tests. Budget about 30 minutes from a fresh checkout to a running deployment.

For the reference on deployment shapes and persistence selection, see deployment.md. For environment-variable semantics, see environment.md.

Prerequisites

  • Docker 24+ and docker compose.
  • git, curl, jq, and a shell that supports $() substitution.
  • python3 ≥ 3.12 for running the deployment-smoke script.
  • Network access to pull the postgres:16 and Python base images.

This tutorial uses 127.0.0.1; for a real deployment, substitute your host or load-balancer URL throughout.

1 · Clone and prepare the env file

git clone git@github.com:kneo-agent/kneo-serv.git
cd kneo-serv
cp deploy/production.env.example deploy/production.env
chmod 600 deploy/production.env

deploy/production.env is gitignored — it'll hold your real secrets. Edit it now and replace every replace-… placeholder. The minimum set you must change before binding to a network:

# deploy/production.env

# Auth — replace each token with a high-entropy value.
KNEO_SERV_AUTH_ENABLED=true
KNEO_SERV_API_KEYS=operator:OP_TOKEN:operator;reviewer:REV_TOKEN:reviewer;viewer:VIEW_TOKEN:viewer
KNEO_SERV_ADMIN_API_KEY=ADMIN_TOKEN
KNEO_SERV_SPEC_SIGNING_KEY=SPEC_SIGNING_HMAC_KEY

# PostgreSQL — match the password to the one you'll set below.
POSTGRES_DB=kneo_serv
POSTGRES_USER=kneo_serv
POSTGRES_PASSWORD=DB_PASSWORD

Generate strong tokens:

# 32-byte hex tokens
for name in OP_TOKEN REV_TOKEN VIEW_TOKEN ADMIN_TOKEN SPEC_SIGNING_HMAC_KEY DB_PASSWORD; do
  printf '%s=%s\n' "$name" "$(openssl rand -hex 32)"
done

Paste the generated values into deploy/production.env. Do not commit this file.

2 · Validate the env file

There's a validator that catches common mistakes (placeholder tokens, incomplete scoped roles, missing DSN, telemetry payload capture left on by accident):

python scripts/validate_staging_env.py --env-file deploy/production.env

Address any errors before continuing. Common findings:

  • replace-… strings still present.
  • Scoped role list missing operator, reviewer, or viewer.
  • KNEO_SERV_OTEL_RECORD_ARGUMENTS=true without an explicit data-classification override.

3 · Start the Compose stack

The stack runs the API plus PostgreSQL with a persistent volume.

docker compose --env-file deploy/production.env up --build -d

--build rebuilds the API image so any local edits land. -d runs detached. To watch logs:

docker compose logs -f api

You should see the API come up after PostgreSQL passes its healthcheck (about 10–20 seconds on a cold start). The API logs include a line per migration applied at first startup.

4 · Verify readiness

export BASE=http://127.0.0.1:8000
curl -sf "$BASE/livez"
curl -sf "$BASE/readyz" | jq

/livez returns {"ok": true, "metadata": {}} as soon as the process accepts connections. /readyz returns 200 only after every dependency check passes:

{
  "ok": true,
  "metadata": {
    "ready": true,
    "manager": "PlatformManager",
    "checks": {
      "run_state_store": {"name": "run_state_store", "ok": true},
      "continuation_store": {"name": "continuation_store", "ok": true},
      "queue": {"name": "queue", "ok": true},
      "runtime_registry": {"name": "runtime_registry", "ok": true, "count": 3, "names": ["adapter", "bridge", "native"]},
      "tool_registry": {"name": "tool_registry", "ok": true, "count": 4, "names": ["compress_history", "publish_report", "summarize", "web_search"]},
      "providers": {"name": "providers", "ok": true},
      "mcp": {"name": "mcp", "ok": true}
    }
  }
}

If you get a 503, the body identifies which check failed; see troubleshooting.md § 1.2.

5 · Run the deployment smoke

The smoke script exercises the full path: auth, spec validation, run creation, human resume, audit listing, credential inventory, and policy update.

export OP_TOKEN=<your-operator-token>
export REV_TOKEN=<your-reviewer-token>
export VIEW_TOKEN=<your-viewer-token>

python scripts/deployment_smoke.py \
  --base-url "$BASE" \
  --operator-token "$OP_TOKEN" \
  --reviewer-token "$REV_TOKEN" \
  --viewer-token "$VIEW_TOKEN"

A clean run prints each step with a PASS and exits 0. If any step fails, the script identifies the failing endpoint and HTTP status. See deployment_smoke.md for the full step list and what each step verifies.

6 · Submit a real run

curl -sf -X POST "$BASE/v1/runs" \
  -H "Authorization: Bearer $OP_TOKEN" \
  -H 'Content-Type: application/json' \
  -d '{
    "input": "smoke",
    "spec_path": "examples/smoke_human_workflow.yaml",
    "target": "workflow"
  }' | jq

This spec uses the in-process dummy provider so it runs without provider credentials. You should see a paused response with a continuation_id (the workflow has a human step). Resume it:

curl -sf -X POST "$BASE/v1/human-tasks/cont_…/resume" \
  -H "Authorization: Bearer $REV_TOKEN" \
  -H 'Content-Type: application/json' \
  -d '{"request_id": "req_…", "decision": "approved"}' | jq

For the full HITL flow, see human_in_the_loop.md.

7 · Verify persistence survives restart

Confirm PostgreSQL volume persistence:

docker compose --env-file deploy/production.env restart api
sleep 5
curl -sf "$BASE/v1/runs?limit=5" \
  -H "Authorization: Bearer $OP_TOKEN" | jq '.runs[].run_id'

You should see the run id from step 6 in the list, even after the API container restarts. The data lives in the postgres-data named volume, not the container layer.

8 · Capacity tuning knobs

For a real production deployment, revisit these env vars in deploy/production.env after you have load profile data:

Variable Default Tune when
KNEO_SERV_PROVIDER_TIMEOUT_SECONDS 120 Provider tail latency exceeds default.
KNEO_SERV_PROVIDER_RETRIES 2 Provider has documented transient error rate.
KNEO_SERV_MAX_BODY_BYTES 1 MiB You receive larger inline specs or override payloads.
KNEO_SERV_MAX_INPUT_CHARS 20000 Run inputs are larger than the default.
KNEO_SERV_RETENTION_RUNS_DAYS unset Storage growth requires capping run history.
KNEO_SERV_CHECKPOINT_COMPRESS_BYTES 64 KiB Many large checkpoints; reduce to compress more.

Full list and semantics: environment.md.

9 · Backup the database

A seeded backup/restore drill is part of the release checklist. The shape for production:

# Backup
docker compose --env-file deploy/production.env exec db \
  pg_dump -U "$POSTGRES_USER" "$POSTGRES_DB" \
  | gzip > "kneo_serv-$(date +%Y%m%d-%H%M).sql.gz"

# Restore (DESTRUCTIVE — wipes current data)
gunzip -c kneo_serv-YYYYmmDD-HHMM.sql.gz \
  | docker compose --env-file deploy/production.env exec -T db \
      psql -U "$POSTGRES_USER" "$POSTGRES_DB"

The full drill — including verifying that runs, checkpoints, audit events, and policy metadata survive a restore — is in release_checklist.md.

10 · Tear down

To stop the stack but keep data:

docker compose --env-file deploy/production.env down

To stop the stack and delete the PostgreSQL volume (destroys runs, checkpoints, audit events, continuations):

docker compose --env-file deploy/production.env down --volumes

Use the volume-deleting form for clean re-tests; never run it against a production deployment without a verified backup.

Common failure modes

Symptom See
API container restarts with KNEO service auth is enabled but no API keys are configured troubleshooting.md § 1.1
/readyz 503 with run_state_store not ok troubleshooting.md § 2.2
Service writes to SQLite even with DSN set troubleshooting.md § 2.1
Smoke script fails on policy write API key probably missing policies:write; see troubleshooting.md § 4.2

Where to go next

  • staging_release_runbook.md — promotion path beyond a single host.
  • deployment.md — reference for deployment shapes, including running without Compose.
  • environment.md — every env var.
  • tutorial_custom_tool.md — extend this deployment with custom tools.