Skip to content

Agent Middleware Guide

This guide explains how middleware works in kneo_agent.

What Middleware Is For

Middleware is the SDK's cross-cutting interception layer. It is useful for:

  • logging and tracing
  • guardrails and policy enforcement
  • request or response mutation
  • tool result rewriting
  • short-circuiting runs or streams

The design combines two ideas:

  • shared mutable context objects
  • phase-specific hooks around the agent loop

Supported Hook Points

kneo_agent exposes four middleware hooks:

  • wrap_run(context, handler) for Agent.run(...)
  • wrap_stream(context, handler) for Agent.stream(...)
  • wrap_model_call(context, handler) for one Bridge executor model call
  • wrap_tool_call(context, handler) for one Bridge executor tool dispatch

wrap_run(...) and wrap_stream(...) apply to every runtime style.

wrap_model_call(...) and wrap_tool_call(...) apply only to Bridge runtimes, because Kneo owns the inner loop there.

Static vs Per-Run Middleware

Attach middleware statically at build time:

from kneo_agent import AgentBuilder, BaseAgentMiddleware


class LoggingMiddleware(BaseAgentMiddleware):
    async def wrap_run(self, context, handler):
        print(f"running {context.agent_name}")
        return await handler(context)


agent = (
    AgentBuilder()
    .add_middleware(LoggingMiddleware())
    .use_runtime(runtime)
    .build()
)

Attach middleware for a single invocation:

from kneo_agent import RunConfig

result = await agent.run(
    "hello",
    run_config=RunConfig(middlewares=[LoggingMiddleware()]),
)

Per-run middleware is appended after statically configured middleware, so the per-run middleware executes deeper in the chain.

Shared Mutable Context

Each hook receives a context object with the current request state and a handler function that delegates to the next middleware or terminal runtime operation.

Middleware can:

  • mutate context.messages
  • replace or augment context.run_config
  • add values to context.metadata
  • return early without calling handler(...)

Short-Circuiting

Middleware may skip the rest of the pipeline entirely.

That is useful for:

  • blocked requests
  • cached answers
  • synthetic previews
  • fallback behavior

Relationship To Skills

Skills and middleware are complementary but different:

  • skills package reusable data such as prompts, tools, defaults, and metadata
  • middleware is executable code that wraps runtime behavior

Skills compile into RunConfig. Middleware remains an explicit object attached through AgentBuilder or RunConfig.

OpenTelemetry

kneo_agent.observability.OpenTelemetryMiddleware is a built-in middleware that emits spans following the OpenTelemetry GenAI semantic conventions. Spans are named chat <agent> for runs, chat iter=<n> for Bridge model calls, and execute_tool <name> for tool executions, with attributes including gen_ai.system, gen_ai.operation.name, gen_ai.agent.name, gen_ai.tool.name, and gen_ai.usage.{input,output}_tokens when the runtime surfaces token counts via RunResult.metadata["usage"].

Install the optional extra:

pip install "kneo-agent[telemetry]"

Wire it up like any other middleware:

from kneo_agent import AgentBuilder
from kneo_agent.observability import OpenTelemetryMiddleware

agent = (
    AgentBuilder()
    .add_middleware(OpenTelemetryMiddleware())
    .use_bridge(runtime)
    .build()
)

Pass record_arguments=False to suppress tool-call argument capture when inputs may contain PII. Pass record_results=True to also attach tool results to spans.

Production middleware bundle (v1.2.0)

kneo_agent.middleware ships four production-grade middleware classes built on the framework above. Use them as drop-ins:

from kneo_agent.middleware import (
    RetryMiddleware,
    RateLimitMiddleware,
    TokenBudgetMiddleware,
    RedactionMiddleware,
    COMMON_PATTERNS,
)
Middleware What it does
RetryMiddleware Retries wrap_model_call and wrap_tool_call with exponential backoff + jitter. Configurable retry_on; never retries asyncio.CancelledError.
RateLimitMiddleware Token-bucket throttle. scope="run" (default) limits whole runs; scope="model_call" limits Bridge model calls.
TokenBudgetMiddleware Enforces per-run and / or cumulative token caps from RunResult.metadata["usage"]. Raises TokenBudgetExceeded.
RedactionMiddleware Regex scrubbing of secrets in inputs, tool args, tool results, the final result, and streamed chunks. Ships a COMMON_PATTERNS starter pack (Bearer / JWT / AWS / OpenAI / connection-string shapes).

Recommended ordering when combined with observability:

  1. RedactionMiddleware — registered first (outer) so input is scrubbed before any inner middleware sees it.
  2. OpenTelemetryMiddleware(record_arguments=True, record_results=False) — inner. record_results=False keeps un-redacted output out of spans.

The framework types (AgentMiddleware, BaseAgentMiddleware, the four context dataclasses) are re-exported from kneo_agent.middleware, so a single import line gives you both the framework and the bundle.

For credential plumbing through tool handlers and MCP transports without leaking values into prompts or trace spans, pair RedactionMiddleware with the SecretProvider Protocol in kneo_agent.utils (see api_stability.md and upgrading_to_1.2.md).

Examples