Observability¶

Operator guide for wiring kneo-serv's structured logs, request tracing, and OpenTelemetry exports into a production observability stack.

This page is the setup view. For symptoms and recovery when observability itself misbehaves, see troubleshooting.md § 7. For the full env-var list, see environment.md § Observability.

Three signals, three surfaces¶

Signal	What it is	Where it comes from
Structured request logs	One JSON record per HTTP request, redacted	`RequestLoggingMiddleware` (always on by default)
Service-side trace events	Per-run trace and checkpoint records, queryable via the API	`TracingMiddleware`, exposed at `/v1/runs/{run_id}/trace`
OpenTelemetry spans	Distributed-tracing spans across SDK-driven agent / tool calls and platform-side operations (queue dispatch, worker lease, continuation lock)	`kneo_agent.observability.OpenTelemetryMiddleware` + `kneo_serv.observability.platform_tracer`, opt-in via `KNEO_SERV_OTEL_ENABLED`
Prometheus metrics	Run-queue gauges + per-process run & token counters	`GET /metrics` (since 0.5.0), opt-out via `KNEO_SERV_METRICS_ENABLED`

Prometheus `/metrics`¶

Since 0.5.0 the service exposes a Prometheus scrape endpoint at GET /metrics (root path only — not under /v1). It is unauthenticated, like /healthz: it carries operational counts, not run content or secrets. Mount it only on a network your monitoring stack can reach (bind the service behind a reverse proxy that does not expose /metrics publicly), or disable it with KNEO_SERV_METRICS_ENABLED=false.

Metric	Type	Meaning
`kneo_runs_started_total`	counter	Runs that began execution.
`kneo_runs_completed_total`	counter	Runs that completed successfully.
`kneo_runs_failed_total`	counter	Runs that failed / timed out / hit max-iterations.
`kneo_runs_dead_lettered_total`	counter	Runs dead-lettered after exceeding `KNEO_SERV_QUEUE_MAX_ATTEMPTS`.
`kneo_runs_rejected_total`	counter	Runs rejected by queue-depth backpressure (terminalized `failed{queue_full}` → `503`). The direct load-shed signal.
`kneo_tokens_input_total`	counter	Input/prompt tokens consumed across runs (from each run's `metadata["usage"]`, when the runtime reports it).
`kneo_tokens_output_total`	counter	Output/completion tokens produced across runs.
`kneo_tokens_total`	counter	Total tokens (input + output) across runs.
`kneo_runs_queued`	gauge	Runs queued and awaiting a worker (your backlog / backpressure signal).
`kneo_runs_running`	gauge	Runs currently leased by a worker.
`kneo_worker_count`	gauge	Live worker threads in this process.

Counters are per-process and reset on restart — use rate() in Prometheus. In a multi-process deployment each instance exposes its own counters; aggregate across instances with sum(). Latency percentiles are not exported here; read them from the OTel spans or your reverse-proxy access logs.

Structured request logs¶

Shape¶

Each request emits a single JSON record on the kneo_serv.service logger:

{
  "client_ip": "10.0.0.7",
  "duration_ms": 18.214,
  "event": "http_request",
  "method": "POST",
  "path": "/v1/runs",
  "request_id": "f3b3…",
  "run_id": "run_…",
  "status_code": 200,
  "user_agent": "kneo-serv-client/0.2.2"
}

Fields the middleware always emits: event, request_id, method, path, status_code, duration_ms. Optional fields when available: client_ip, user_agent. Route-derived fields when the path includes them: run_id, continuation_id. When the request raises: error (exception class name) and message (exception message). Redaction is applied to every payload before it reaches the log line. (kneo_serv/observability/structured_logging.py)

Configuration¶

Variable	Default	Purpose
`KNEO_SERV_REQUEST_LOGS`	`true`	Enable the JSON request log middleware.
`KNEO_SERV_LOG_LEVEL`	`INFO`	Stack-wide level: `kneo_serv.service` + `kneo_serv.platform` + `kneo_agent` SDK.

request_id is generated server-side as a UUID unless the client sends X-Request-ID; either way the service echoes it back on the response header.

Production tuning¶

Keep KNEO_SERV_LOG_LEVEL=INFO in production. DEBUG doubles log volume and can leak diagnostic payloads from middleware that wraps the request logger. Note it is stack-wide: it sets the request logger, the platform worker / lease / drain logger (kneo_serv.platform), and the kneo_agent SDK logger together, so DEBUG turns all three up at once — useful for diagnosing a stuck run end-to-end, noisy as a default.
Configure your container runtime's log driver (Docker json-file with rotation, Kubernetes kubectl logs rotation, journald) — the service writes to stdout and relies on the runtime to rotate.

Log aggregation wiring¶

ELK / OpenSearch. Ship stdout via Filebeat or Vector. The records are already JSON; map request_id and run_id as indexed fields. Pin service.name=kneo-serv from the shipper for cross-deployment search.
Loki. A Promtail pipeline with a json stage will lift request_id, run_id, status_code, and duration_ms to labels. Keep label cardinality bounded — don't promote request_id to a Loki label, query it as content.
Cloud-managed (CloudWatch Logs, GCP Logging). Forward stdout; the managed pipeline parses JSON automatically.

The reverse proxy in front of kneo-serv (tls_and_proxy.md) has the true client IP. The service logs the immediate TCP peer; correlate to the proxy's access logs by request_id (forward X-Request-ID upstream).

Service-side trace events¶

Service-side trace events are persisted as part of run state and returned at GET /v1/runs/{run_id}/trace. They cover workflow step transitions, tool calls, checkpoints, and audit boundaries. These events are emitted by TracingMiddleware independent of any OTel exporter, so they are always available even without OpenTelemetry.

See service_api.md § Audit events and service_api.md § Replay and checkpoint diff for the contract.

Audit-event export¶

Audit events are always queryable via GET /v1/audit-events and stored in the run-state store. For compliance retention you can additionally stream them out: set KNEO_SERV_AUDIT_EXPORT_ENABLED=true and each persisted event is emitted as a JSON line on the dedicated kneo_serv.audit logger, from the same record_audit_event chokepoint (so the payload is already redacted).

It is off by default — enabling it is a zero-behavior-change opt-in. The transport is plain stdlib logging, so wire it to your sink the usual way:

import logging
from logging.handlers import SysLogHandler

audit = logging.getLogger("kneo_serv.audit")
audit.addHandler(SysLogHandler(address=("siem.internal", 514)))
audit.setLevel(logging.INFO)

With no dedicated handler the events propagate to the root logger and land in your normal JSON logs alongside the request logs. This is the v1 sink; a direct SIEM/OTLP-logs exporter may follow.

OpenTelemetry spans¶

When the deployment includes the SDK telemetry support (the [telemetry] or [deploy] extras), set KNEO_SERV_OTEL_ENABLED=true to attach kneo_agent.observability.OpenTelemetryMiddleware. Argument and result capture (KNEO_SERV_OTEL_RECORD_ARGUMENTS, KNEO_SERV_OTEL_RECORD_RESULTS) are off by default because tool inputs and outputs frequently contain user payloads — enable them only after the deployment's data classification has approved payload capture.

See environment.md § Observability for the full env-var reference.

Platform-side spans¶

The SDK's OpenTelemetryMiddleware covers the agent boundary — runs, tool calls, model calls. The platform also instruments operations that happen outside the agent's execution:

Span name	Where	Attributes
`kneo.queue.dispatch`	`PlatformManager.dispatch_run` — when a run is enqueued for an async worker	`kneo.run.id`
`kneo.worker.lease`	Async worker loop — one span per lease attempt against the queue	`kneo.worker.id`, `kneo.worker.lease_seconds`, `kneo.worker.claimed` (bool), `kneo.run.id` (if claimed)
`kneo.continuation.lock`	`PlatformManager.resume_human_task` — when the per-continuation lock is acquired before resume	`kneo.continuation.id`, `kneo.lock.name`, `kneo.lock.ttl_seconds`, `kneo.lock.acquired` (bool)

These spans share the same KNEO_SERV_OTEL_ENABLED flag — they're a clean no-op when telemetry is off (no overhead beyond a single env-var check). Span names use the kneo.<area>.<operation> convention so they sort cleanly alongside SDK-owned spans in tracing UIs.

GenAI semantic conventions (gen_ai.*). The OpenTelemetry GenAI attributes (gen_ai.system, gen_ai.operation.name, token counts, …) are emitted by the SDK's OpenTelemetryMiddleware on the agent / model-call spans — that is where the model provider is known. The platform-side spans above are provider-agnostic infrastructure operations (queue / lease / lock) and deliberately do not carry gen_ai.*; querying GenAI telemetry uses the SDK-owned model spans.

Lease spans with kneo.worker.claimed=false indicate an empty queue — useful for measuring how often workers idle. Continuation lock spans with kneo.lock.acquired=false correlate with the LockAcquisitionError shown in troubleshooting.md § 8.1.

Exporter configuration¶

The service uses the OpenTelemetry global tracer provider; exporters are configured with standard OTEL_* environment variables that the OTel SDK reads. Example for OTLP/HTTP to any compatible backend (Honeycomb, Grafana Tempo, Tempo Cloud, Datadog, an OTel Collector):

export KNEO_SERV_OTEL_ENABLED=true

export OTEL_EXPORTER_OTLP_PROTOCOL=http/protobuf
export OTEL_EXPORTER_OTLP_ENDPOINT=https://api.honeycomb.io
export OTEL_EXPORTER_OTLP_HEADERS="x-honeycomb-team=$HONEYCOMB_API_KEY"
export OTEL_SERVICE_NAME=kneo-serv
export OTEL_RESOURCE_ATTRIBUTES="deployment.environment=prod"

For a self-hosted OTel Collector running as a sidecar:

export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317
export OTEL_EXPORTER_OTLP_PROTOCOL=grpc

If OTel does not appear to be exporting, see troubleshooting.md § 7.2.

What to watch in production¶

A minimal alerting baseline covers the failure modes that page on-call:

Signal	What it means	Where to read it
`/readyz` returns `503` for more than 1 probe interval	A dependency check is failing	Reverse proxy / load balancer health checks
Sustained `5xx` rate above baseline	Service-side errors	Proxy access logs; `status_code` from JSON logs
`duration_ms` p95 climbing over baseline	Latency regression — provider, queue, or DB pressure	JSON log records
Queue depth (`status=queued`) growing unbounded	Workers stuck or backpressured	`/readyz` `queue` check; `curl "$BASE/v1/runs?status=queued&limit=20"`
Spike in `event=http_request` records with `error`	Application-level exceptions	JSON log records

Wire your alerting against these signals from the proxy and the aggregated logs; the service does not push its own alerts.

What this page does not cover¶

Per-IP rate limiting and traffic shaping. The reverse proxy's job (tls_and_proxy.md).
Request-level latency histograms on /metrics. The /metrics endpoint (since 0.5.0) exports run-queue gauges and run counters, not per-request latency percentiles. Derive request latency from OTel spans or the reverse-proxy access logs.
Tracing internals. For the design of the in-process tracer and checkpoint events, see docs/dev/design.md and docs/dev/implementation_map.md.