Agent Middleware Guide¶
This guide explains how middleware works in kneo_agent.
What Middleware Is For¶
Middleware is the SDK's cross-cutting interception layer. It is useful for:
- logging and tracing
- guardrails and policy enforcement
- request or response mutation
- tool result rewriting
- short-circuiting runs or streams
The design combines two ideas:
- shared mutable context objects
- phase-specific hooks around the agent loop
Supported Hook Points¶
kneo_agent exposes four middleware hooks:
wrap_run(context, handler)forAgent.run(...)wrap_stream(context, handler)forAgent.stream(...)wrap_model_call(context, handler)for one Bridge executor model callwrap_tool_call(context, handler)for one Bridge executor tool dispatch
wrap_run(...) and wrap_stream(...) apply to every runtime style.
wrap_model_call(...) and wrap_tool_call(...) apply only to Bridge runtimes,
because Kneo owns the inner loop there.
Static vs Per-Run Middleware¶
Attach middleware statically at build time:
from kneo_agent import AgentBuilder, BaseAgentMiddleware
class LoggingMiddleware(BaseAgentMiddleware):
async def wrap_run(self, context, handler):
print(f"running {context.agent_name}")
return await handler(context)
agent = (
AgentBuilder()
.add_middleware(LoggingMiddleware())
.use_runtime(runtime)
.build()
)
Attach middleware for a single invocation:
from kneo_agent import RunConfig
result = await agent.run(
"hello",
run_config=RunConfig(middlewares=[LoggingMiddleware()]),
)
Per-run middleware is appended after statically configured middleware, so the per-run middleware executes deeper in the chain.
Shared Mutable Context¶
Each hook receives a context object with the current request state and a
handler function that delegates to the next middleware or terminal runtime
operation.
Middleware can:
- mutate
context.messages - replace or augment
context.run_config - add values to
context.metadata - return early without calling
handler(...)
Short-Circuiting¶
Middleware may skip the rest of the pipeline entirely.
That is useful for:
- blocked requests
- cached answers
- synthetic previews
- fallback behavior
Relationship To Skills¶
Skills and middleware are complementary but different:
- skills package reusable data such as prompts, tools, defaults, and metadata
- middleware is executable code that wraps runtime behavior
Skills compile into RunConfig. Middleware remains an explicit object attached
through AgentBuilder or RunConfig.
OpenTelemetry¶
kneo_agent.observability.OpenTelemetryMiddleware is a built-in middleware that
emits spans following the OpenTelemetry GenAI semantic conventions. Spans are
named chat <agent> for runs, chat iter=<n> for Bridge model calls, and
execute_tool <name> for tool executions, with attributes including
gen_ai.system, gen_ai.operation.name, gen_ai.agent.name,
gen_ai.tool.name, and gen_ai.usage.{input,output}_tokens when the runtime
surfaces token counts via RunResult.metadata["usage"].
Install the optional extra:
Wire it up like any other middleware:
from kneo_agent import AgentBuilder
from kneo_agent.observability import OpenTelemetryMiddleware
agent = (
AgentBuilder()
.add_middleware(OpenTelemetryMiddleware())
.use_bridge(runtime)
.build()
)
Pass record_arguments=False to suppress tool-call argument capture when
inputs may contain PII. Pass record_results=True to also attach tool results
to spans.
Production middleware bundle (v1.2.0)¶
kneo_agent.middleware ships four production-grade middleware classes
built on the framework above. Use them as drop-ins:
from kneo_agent.middleware import (
RetryMiddleware,
RateLimitMiddleware,
TokenBudgetMiddleware,
RedactionMiddleware,
COMMON_PATTERNS,
)
| Middleware | What it does |
|---|---|
RetryMiddleware |
Retries wrap_model_call and wrap_tool_call with exponential backoff + jitter. Configurable retry_on; never retries asyncio.CancelledError. |
RateLimitMiddleware |
Token-bucket throttle. scope="run" (default) limits whole runs; scope="model_call" limits Bridge model calls. |
TokenBudgetMiddleware |
Enforces per-run and / or cumulative token caps from RunResult.metadata["usage"]. Raises TokenBudgetExceeded. |
RedactionMiddleware |
Regex scrubbing of secrets in inputs, tool args, tool results, the final result, and streamed chunks. Ships a COMMON_PATTERNS starter pack (Bearer / JWT / AWS / OpenAI / connection-string shapes). |
Recommended ordering when combined with observability:
RedactionMiddleware— registered first (outer) so input is scrubbed before any inner middleware sees it.OpenTelemetryMiddleware(record_arguments=True, record_results=False)— inner.record_results=Falsekeeps un-redacted output out of spans.
The framework types (AgentMiddleware, BaseAgentMiddleware, the four
context dataclasses) are re-exported from kneo_agent.middleware, so a
single import line gives you both the framework and the bundle.
For credential plumbing through tool handlers and MCP transports without
leaking values into prompts or trace spans, pair RedactionMiddleware
with the SecretProvider Protocol in kneo_agent.utils (see
api_stability.md and
upgrading_to_1.2.md).