Skip to content

Upgrading to 1.2.0

A short, practical "should I upgrade?" walkthrough for users on 1.1.x. The full list of changes is in CHANGELOG.md; this page covers only the parts that affect existing code.

TL;DR

  • No breaking changes. Every public name from 1.1.x still works the same way. New features are additive. You can pip install --upgrade kneo-agent and run your existing code unchanged.
  • Worth upgrading for: local-LLM support without pre-building an AsyncOpenAI client; the new middleware bundle (retry / rate-limit / token-budget / redaction); a documented RunResult.metadata["usage"] schema; MCP TLS / mTLS support; and a SecretProvider Protocol with three reference implementations.
  • Read this page first if you: pin to a specific subpackage size, pass MCP servers behind a corporate CA, or care about the RunResult.metadata shape.

What's new

1. Local LLMs via base_url=

Before 1.2.0 you had to pre-build an AsyncOpenAI client to point at Ollama / vLLM / llama.cpp / LocalAI. Now base_url= and api_key= are first-class on every OpenAI entry point:

from kneo_agent import build_sync_agent

agent = build_sync_agent(
    "openai",
    base_url="http://localhost:11434/v1",
    model="llama3.1",
    system_prompt="You are a concise assistant.",
)

The same parameters land on BridgeAgentFactory.for_openai, NativeRuntimeFactory.for_openai, and OpenAIAgentsImpl. Passing both openai_client= and base_url= raises ValueError.

When base_url is set without api_key, the SDK uses a placeholder string so AsyncOpenAI doesn't error on a missing key (most local servers ignore it). Set api_key= explicitly when your local server checks it.

See examples/cookbook/local_ollama.py for a runnable demo.

2. Middleware bundle (kneo_agent.middleware)

Four new middlewares ship under a new subpackage:

from kneo_agent.middleware import (
    RetryMiddleware,
    RateLimitMiddleware,
    TokenBudgetMiddleware,
    RedactionMiddleware,
    COMMON_PATTERNS,
)
  • RetryMiddleware — retries transient model / tool failures with exponential backoff; never retries asyncio.CancelledError.
  • RateLimitMiddleware — token-bucket throttle on whole runs or individual model calls.
  • TokenBudgetMiddleware — per-run / cumulative caps from RunResult.metadata["usage"]. Raises TokenBudgetExceeded.
  • RedactionMiddleware — regex scrubbing of secrets in inputs, tool args, tool results, the final result, and streamed chunks. Ships a COMMON_PATTERNS starter pack.

The framework types (AgentMiddleware, BaseAgentMiddleware, the context dataclasses) are re-exported from kneo_agent.middleware, so you can import everything from one place.

Ordering rule (covered in detail in the package docstring):

  1. Static middleware (registered via AgentBuilder.add_middleware) wraps every run, in registration order.
  2. Per-run middleware (passed via RunConfig.middlewares) is appended inside the static list.

For redaction + observability specifically: register RedactionMiddleware outer (first via add_middleware) and OpenTelemetryMiddleware(record_results=False) inner. The redaction layer scrubs inputs before the OTel layer records them, and record_results=False keeps un-redacted output out of the spans.

3. Documented RunResult.metadata["usage"] schema

The usage key on RunResult.metadata now has a formal shape:

{
    "input_tokens": int,
    "output_tokens": int,
    "total_tokens": int,           # equals input + output
}

prompt_tokens / completion_tokens are accepted as aliases for provider-native naming.

Population today: the OpenAI Agents native runtime (for_openai in both NativeRuntimeFactory and BridgeAgentFactory) populates metadata["usage"] automatically by summing across raw_responses[*].usage. LangChain, Google ADK, and Bridge runtimes do not yet populate it; if you need usage tracking on those paths, populate metadata["usage"] from a custom middleware. The full docstring on RunResult enumerates which runtimes do and don't populate.

OpenTelemetryMiddleware reads the documented keys and emits the GenAI semantic-convention attributes gen_ai.usage.input_tokens / gen_ai.usage.output_tokens. TokenBudgetMiddleware reads the same keys to enforce caps.

4. MCP TLS / mTLS / custom CA

MCPServerConfig.http() and .sse() now accept TLS knobs:

from kneo_agent import MCPServerConfig

config = MCPServerConfig.http(
    name="internal",
    url="https://mcp.internal.corp/v1",
    ca_bundle="/etc/ssl/corp/ca.pem",
    client_cert="/etc/ssl/corp/client.crt",
    client_key="/etc/ssl/corp/client.key",
    headers={"X-Internal-Auth": "..."},
)

MCPServerConfig.build_ssl_context() returns the corresponding ssl.SSLContext; both transports thread it into urlopen(..., context=...). No code change is needed if you were already on the public CA path.

5. SecretProvider Protocol

kneo_agent.utils now exports a SecretProvider Protocol plus three reference implementations:

from kneo_agent.utils import (
    EnvSecretProvider,       # env vars with optional prefix
    FileSecretProvider,      # K8s/Docker secret-mount file pattern
    MappingSecretProvider,   # in-memory dict for tests
    SecretProvider,
    SecretNotFound,
)

The SDK does not auto-resolve secrets — it documents one well-known shape so applications plumb credentials through their tool handlers and MCP transports without embedding raw strings in prompts or tool args. The Protocol is runtime_checkable, so isinstance(my_vault_provider, SecretProvider) works against your own implementation.

Pair with RedactionMiddleware for defense in depth.

6. Workflow retry-on-failure

kneo_agent.workflows.RetryStep wraps any WorkflowComponent (an Agent, FunctionStep, or nested Workflow) with retry semantics. Mirrors the middleware version; preserves the inner component's name so existing graph edges still match.

from kneo_agent.workflows import RetryStep, WorkflowBuilder

builder = WorkflowBuilder(RetryStep(prepare, max_attempts=3))
builder.add_executor(RetryStep(call_provider, max_attempts=5,
                               retry_on=(ConnectionError,)))

WorkflowBuilder.add_edge(..., condition=...) and WorkflowEdge.condition were already available in 1.1.x — see examples/cookbook/workflow_branching_and_retry.py for a full graph example. There is no separate WorkflowGraphBuilder classWorkflowBuilder is the graph builder.

7. Sync-API failure-mode callout

The build_sync_agent / SyncAgent failure mode (raises RuntimeError inside an already-running event loop — Jupyter, FastAPI handler, other async code) is now documented in three places: README, docs/README.md, and the API reference manual. No code change.

8. Python 3.13 classifier

Programming Language :: Python :: 3.13 is now in pyproject.toml classifiers. CI was already exercising 3.13 in 1.1.x.

What didn't change

  • The top-level kneo_agent.__all__ set (75 names) is unchanged. A new snapshot test (tests/unit/test_public_api_snapshot.py) pins this surface so future drift fails CI.
  • The AgentMiddleware Protocol and four hooks (wrap_run, wrap_stream, wrap_model_call, wrap_tool_call) are unchanged.
  • The provider-extra version floors are unchanged.

What's still pending

The following items from the v1.2.0 roadmap landed in design / docs but did not land in the SDK code:

  • Usage population for LangChain / Google ADK / Bridge runtimes. Documented in the RunResult docstring as not-yet-populated; apps that need this should populate via custom middleware.
  • Enterprise data-source tool recipes (SQL / object store / REST / search). The cookbook structure is in place; the recipes themselves are tracked for a follow-up release.

CHANGELOG and policy

Full diff: CHANGELOG.md. Public API stability guarantees and the deprecation policy: api_stability.md.