Live Beta · Free to start

Trace every LLM call.
Know what it costs.

Zero-dependency Python SDK for production AI agent visibility — token costs, latency, and span-level traces without restructuring your code.

$ pip install agenttrace

Get started free See how it works

agent_run · session_a4f2b1

live

user

Summarize Q3 earnings and flag any risks.

claude-sonnet-4-6

Q3 revenue reached $2.4B, up 12% YoY. Key risks: COGS margin pressure (+180bps) and FX headwinds in EMEA

trace · 5 spans · 284ms

agent_run

284ms

rag_retrieve

68ms

claude-sonnet

1,240 tok · $0.0062

$0.0062

parse_response

parse

11ms

format_output

fmt

9ms

$0.0062 trace cost

284ms total latency

1,240 tokens used

Features

Everything you need to
understand your agents

Instrument once. Costs, latency, and spans appear automatically.

Zero Dependencies

Pure Python stdlib. No version conflicts. Drop into any project with one pip install and move on.

Real-Time Cost Tracking

Built-in pricing for GPT-4o, Claude, Gemini, Llama, and more. Cost per span, per session, per model.

Distributed Trace Spans

Nested spans with latency, token counts, and errors per call — across any multi-step agent workflow.

Non-Blocking by Design

Traces send in a background daemon thread. Your agent never waits — zero latency overhead.

REST API + Stats

Query spend by model, project, or window. Aggregated stats and full trace history via REST.

Self-Hostable

MIT-licensed FastAPI backend. Deploy to Render in five minutes. Your traces never leave your infra.

Integration

Three lines to instrument
any agent

Decorator, context manager, or manual spans — pick what fits.

Decorator

Wrap any function. Timing and error capture are automatic.

agent.py

python

import agenttrace

agenttrace.init(api_key="at_...", project="prod")

@agenttrace.trace("call_llm")
def call_llm(prompt):
    resp = client.chat.create(model="gpt-4o-mini", ...)
    agenttrace.record_tokens(
        "gpt-4o-mini",
        input_tokens=resp.usage.prompt_tokens,
        output_tokens=resp.usage.completion_tokens,
    )
    return resp.choices[0].message.content

Context Manager

Span any block — retrieval, tool calls, post-processing.

pipeline.py

python

import agenttrace

with agenttrace.span("rag_retrieve"):
    docs = vector_db.query(embedding, top_k=5)

with agenttrace.span("summarize"):
    agenttrace.record_tokens(
        "claude-sonnet-4-6",
        input_tokens=1_240,
        output_tokens=310,
    )
    summary = claude.summarize(docs)

# cost + latency captured per span

Models

Cost tracking for every
major model

Pricing built in. No config needed — just pass the model name.

gpt-4o gpt-4o-mini o1-preview claude-sonnet-4-6 claude-haiku-4-5 claude-3-opus gemini-1.5-pro gemini-1.5-flash llama-3.1-70b llama-3.1-405b mistral-large + more

Pricing

Simple, honest pricing

Start free. Scale when you need it. Self-host forever.

Free

forever

50K traces / month
7-day retention
REST API + SDK
Community support

Popular

Pro

$19

per month

2M traces / month
90-day retention
Cost breakdown by model
Email support

Get Pro

Team

$59

per month

Unlimited traces
1-year retention
Multi-project isolation
Priority support

Get Team

Always free to self-host · Deploy to Render in under 5 minutes

Trace every LLM call.Know what it costs.

Everything you need tounderstand your agents