Zero-dependency Python SDK for production AI agent visibility — token costs, latency, and span-level traces without restructuring your code.
Instrument once. Costs, latency, and spans appear automatically.
Pure Python stdlib. No version conflicts. Drop into any project with one pip install and move on.
Built-in pricing for GPT-4o, Claude, Gemini, Llama, and more. Cost per span, per session, per model.
Nested spans with latency, token counts, and errors per call — across any multi-step agent workflow.
Traces send in a background daemon thread. Your agent never waits — zero latency overhead.
Query spend by model, project, or window. Aggregated stats and full trace history via REST.
MIT-licensed FastAPI backend. Deploy to Render in five minutes. Your traces never leave your infra.
Decorator, context manager, or manual spans — pick what fits.
Decorator
Wrap any function. Timing and error capture are automatic.
import agenttrace agenttrace.init(api_key="at_...", project="prod") @agenttrace.trace("call_llm") def call_llm(prompt): resp = client.chat.create(model="gpt-4o-mini", ...) agenttrace.record_tokens( "gpt-4o-mini", input_tokens=resp.usage.prompt_tokens, output_tokens=resp.usage.completion_tokens, ) return resp.choices[0].message.content
Context Manager
Span any block — retrieval, tool calls, post-processing.
import agenttrace with agenttrace.span("rag_retrieve"): docs = vector_db.query(embedding, top_k=5) with agenttrace.span("summarize"): agenttrace.record_tokens( "claude-sonnet-4-6", input_tokens=1_240, output_tokens=310, ) summary = claude.summarize(docs) # cost + latency captured per span
Pricing built in. No config needed — just pass the model name.
Start free. Scale when you need it. Self-host forever.
Always free to self-host · Deploy to Render in under 5 minutes
Open source. Self-hostable on Render. Free during beta.