Live Beta · Free to start

Trace every LLM call.
Know what it costs.

Zero-dependency Python SDK for production AI agent visibility — token costs, latency, and span-level traces without restructuring your code.

$ pip install agenttrace
agent_run · session_a4f2b1
live
user
Summarize Q3 earnings and flag any risks.
claude-sonnet-4-6
Q3 revenue reached $2.4B, up 12% YoY. Key risks: COGS margin pressure (+180bps) and FX headwinds in EMEA
trace · 5 spans · 284ms
agent_run
agent_run
284ms
rag_retrieve
rag_retrieve
68ms
claude-sonnet
1,240 tok · $0.0062
$0.0062
parse_response
parse
11ms
format_output
fmt
9ms

$0.0062 trace cost
284ms total latency
1,240 tokens used
Features

Everything you need to
understand your agents

Instrument once. Costs, latency, and spans appear automatically.

01

Zero Dependencies

Pure Python stdlib. No version conflicts. Drop into any project with one pip install and move on.

02

Real-Time Cost Tracking

Built-in pricing for GPT-4o, Claude, Gemini, Llama, and more. Cost per span, per session, per model.

03

Distributed Trace Spans

Nested spans with latency, token counts, and errors per call — across any multi-step agent workflow.

04

Non-Blocking by Design

Traces send in a background daemon thread. Your agent never waits — zero latency overhead.

05

REST API + Stats

Query spend by model, project, or window. Aggregated stats and full trace history via REST.

06

Self-Hostable

MIT-licensed FastAPI backend. Deploy to Render in five minutes. Your traces never leave your infra.

Integration

Three lines to instrument
any agent

Decorator, context manager, or manual spans — pick what fits.

Decorator

Wrap any function. Timing and error capture are automatic.

agent.py
python
import agenttrace

agenttrace.init(api_key="at_...", project="prod")

@agenttrace.trace("call_llm")
def call_llm(prompt):
    resp = client.chat.create(model="gpt-4o-mini", ...)
    agenttrace.record_tokens(
        "gpt-4o-mini",
        input_tokens=resp.usage.prompt_tokens,
        output_tokens=resp.usage.completion_tokens,
    )
    return resp.choices[0].message.content

Context Manager

Span any block — retrieval, tool calls, post-processing.

pipeline.py
python
import agenttrace

with agenttrace.span("rag_retrieve"):
    docs = vector_db.query(embedding, top_k=5)

with agenttrace.span("summarize"):
    agenttrace.record_tokens(
        "claude-sonnet-4-6",
        input_tokens=1_240,
        output_tokens=310,
    )
    summary = claude.summarize(docs)

# cost + latency captured per span
Models

Cost tracking for every
major model

Pricing built in. No config needed — just pass the model name.

gpt-4o gpt-4o-mini o1-preview claude-sonnet-4-6 claude-haiku-4-5 claude-3-opus gemini-1.5-pro gemini-1.5-flash llama-3.1-70b llama-3.1-405b mistral-large + more
Pricing

Simple, honest pricing

Start free. Scale when you need it. Self-host forever.

Free
$0
forever
  • 50K traces / month
  • 7-day retention
  • REST API + SDK
  • Community support
Get started free
Popular
Pro
$19
per month
  • 2M traces / month
  • 90-day retention
  • Cost breakdown by model
  • Email support
Get Pro
Team
$59
per month
  • Unlimited traces
  • 1-year retention
  • Multi-project isolation
  • Priority support
Get Team

Always free to self-host · Deploy to Render in under 5 minutes

Start tracing in minutes

Open source. Self-hostable on Render. Free during beta.