Agent Middleware Guide¶

This guide explains how middleware works in kneo_agent.

What Middleware Is For¶

Middleware is the SDK's cross-cutting interception layer. It is useful for:

logging and tracing
guardrails and policy enforcement
request or response mutation
tool result rewriting
short-circuiting runs or streams

The design combines two ideas:

shared mutable context objects
phase-specific hooks around the agent loop

Supported Hook Points¶

kneo_agent exposes four middleware hooks:

wrap_run(context, handler) for Agent.run(...)
wrap_stream(context, handler) for Agent.stream(...)
wrap_model_call(context, handler) for one Bridge executor model call
wrap_tool_call(context, handler) for one Bridge executor tool dispatch

wrap_run(...) and wrap_stream(...) apply to every runtime style.

wrap_model_call(...) and wrap_tool_call(...) apply only to Bridge runtimes, because Kneo owns the inner loop there.

Static vs Per-Run Middleware¶

Attach middleware statically at build time:

from kneo_agent import AgentBuilder, BaseAgentMiddleware


class LoggingMiddleware(BaseAgentMiddleware):
    async def wrap_run(self, context, handler):
        print(f"running {context.agent_name}")
        return await handler(context)


agent = (
    AgentBuilder()
    .add_middleware(LoggingMiddleware())
    .use_runtime(runtime)
    .build()
)

Attach middleware for a single invocation:

from kneo_agent import RunConfig

result = await agent.run(
    "hello",
    run_config=RunConfig(middlewares=[LoggingMiddleware()]),
)

Per-run middleware is appended after statically configured middleware, so the per-run middleware executes deeper in the chain.

Shared Mutable Context¶

Each hook receives a context object with the current request state and a handler function that delegates to the next middleware or terminal runtime operation.

Middleware can:

mutate context.messages
replace or augment context.run_config
add values to context.metadata
return early without calling handler(...)

Short-Circuiting¶

Middleware may skip the rest of the pipeline entirely.

That is useful for:

blocked requests
cached answers
synthetic previews
fallback behavior

Relationship To Skills¶

Skills and middleware are complementary but different:

skills package reusable data such as prompts, tools, defaults, and metadata
middleware is executable code that wraps runtime behavior

Skills compile into RunConfig. Middleware remains an explicit object attached through AgentBuilder or RunConfig.

OpenTelemetry¶

kneo_agent.observability.OpenTelemetryMiddleware is a built-in middleware that emits spans following the OpenTelemetry GenAI semantic conventions. Spans are named chat <agent> for runs, chat iter=<n> for Bridge model calls, and execute_tool <name> for tool executions, with attributes including gen_ai.system, gen_ai.operation.name, gen_ai.agent.name, gen_ai.tool.name, and gen_ai.usage.{input,output}_tokens when the runtime surfaces token counts via RunResult.metadata["usage"].

Install the optional extra:

pip install "kneo-agent[telemetry]"

Wire it up like any other middleware:

from kneo_agent import AgentBuilder
from kneo_agent.observability import OpenTelemetryMiddleware

agent = (
    AgentBuilder()
    .add_middleware(OpenTelemetryMiddleware())
    .use_bridge(runtime)
    .build()
)

Pass record_arguments=False to suppress tool-call argument capture when inputs may contain PII. Pass record_results=True to also attach tool results to spans.

Production middleware bundle (v1.2.0)¶

kneo_agent.middleware ships four production-grade middleware classes built on the framework above. Use them as drop-ins:

from kneo_agent.middleware import (
    RetryMiddleware,
    RateLimitMiddleware,
    TokenBudgetMiddleware,
    RedactionMiddleware,
    COMMON_PATTERNS,
)

Middleware	What it does	Pattern support
`RetryMiddleware`	Retries `wrap_model_call` and `wrap_tool_call` with exponential backoff + jitter. Configurable `retry_on`; never retries `asyncio.CancelledError`.	Bridge only. Uses the inner hooks, so under Adapter / Native it is a silent no-op — the provider SDK owns the loop and its own retries.
`RateLimitMiddleware`	Token-bucket throttle. `scope="run"` (default) limits whole runs; `scope="model_call"` limits Bridge model calls.	All patterns with `scope="run"` (default); `scope="model_call"` is Bridge only.
`TokenBudgetMiddleware`	Enforces per-run and / or cumulative token caps from `RunResult.metadata["usage"]`. Raises `TokenBudgetExceeded`.	All patterns — every built-in runtime populates `metadata["usage"]` when the provider reports it (OpenAI native + the three Bridge executors + all three Adapter paths).
`RedactionMiddleware`	Regex scrubbing of secrets in inputs, tool args, tool results, the final result, and streamed chunks. Ships a `COMMON_PATTERNS` starter pack (Bearer / JWT / AWS / OpenAI / connection-string shapes).	Input / final-result / streamed-chunk scrubbing: all patterns. Tool-argument scrubbing: Bridge only (it rides `wrap_tool_call`).

Pattern matters. Because wrap_model_call / wrap_tool_call fire only under the Bridge pattern (see above), middleware that relies on them behaves differently across runtime styles. The clearest footgun: RetryMiddleware does nothing under an Adapter or Native runtime (e.g. any OpenAI agent, which is Native-only) — and does so silently. OpenTelemetryMiddleware is affected too: the outer chat <agent> run span is emitted everywhere, but the per-iteration chat iter=<n> and per-tool execute_tool <name> child spans appear only under Bridge.

Recommended ordering when combined with observability:

RedactionMiddleware — registered first (outer) so input is scrubbed before any inner middleware sees it.
OpenTelemetryMiddleware(record_arguments=True, record_results=False) — inner. record_results=False keeps un-redacted output out of spans.

The framework types (AgentMiddleware, BaseAgentMiddleware, the four context dataclasses) are re-exported from kneo_agent.middleware, so a single import line gives you both the framework and the bundle.

For credential plumbing through tool handlers and MCP transports without leaking values into prompts or trace spans, pair RedactionMiddleware with the SecretProvider Protocol in kneo_agent.utils (see api_stability.md and upgrading_to_1.2.md).