AGNT Traces

Every LLM call your agents make through AGNT Studio produces a trace — the exact prompt that was sent, the exact response that came back, tokens consumed, cost incurred, latency measured, and the full compilation context (variables, conditions, model config). AGNT Traces captures all of it and gives you a built-in edit-and-replay loop to fix problems on the spot.

For fleet-level operational metrics (message volumes, task completion rates, active users, assistant performance), see AGNT Analytics.

Why AGNT Traces

Observability tools show you what happened. AGNT Traces let you fix it.

Most LLM observability platforms (LangSmith, Langfuse, Helicone) give you a read-only trace viewer. You can see the prompt, the response, the tokens, the cost. Great. Now what? You copy the prompt into a playground, tweak it, re-run it manually, copy the changes back to your codebase, open a PR, get it reviewed, deploy it, and hope it works.

AGNT Studio hosts the prompts. So when you open a trace in the playground, edit a block, and re-run it — you're editing the actual prompt. Save your changes and they go directly to the draft. Publish and they're live. The distance from "this response was bad" to "it's fixed in production" is three API calls, not three days.

Quick Start

List recent traces

bash

curl "https://studio.agnt.ai/api/v1/tenants/$TENANT_ID/traces" \
  -H "Authorization: Bearer $TOKEN"

Open a trace in the playground, edit, and save

bash

# 1. Create a playground session from a trace
curl -X POST "https://studio.agnt.ai/api/v1/tenants/$TENANT_ID/traces/$TRACE_ID/playground/sessions" \
  -H "Authorization: Bearer $TOKEN"

# 2. Edit a block in the session
curl -X PATCH "https://studio.agnt.ai/api/v1/tenants/$TENANT_ID/playground/sessions/$SESSION_ID/blocks/$BLOCK_ID" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"content": "Updated instruction text with better wording."}'

# 3. Re-run to test the change (real LLM call)
curl -X POST "https://studio.agnt.ai/api/v1/tenants/$TENANT_ID/playground/sessions/$SESSION_ID/run" \
  -H "Authorization: Bearer $TOKEN"

# 4. Save changes back to the prompt draft
curl -X POST "https://studio.agnt.ai/api/v1/tenants/$TENANT_ID/playground/sessions/$SESSION_ID/save" \
  -H "Authorization: Bearer $TOKEN"

That's the closed loop. Trace to fix in four calls.

Core Concepts

Studio Traces

Studio Traces capture the full context of every LLM call made through AGNT Studio-managed prompts: the compiled prompt (after variable resolution and condition evaluation), the model response, token counts, cost, latency, and status.

Every trace records:

Field	What it captures
`promptName`	Which prompt was compiled
`manifest`	The full compiled manifest (system message, tools, model config)
`etag`	Version fingerprint of the prompt at call time
`variables`	Variable values used for compilation
`messages`	The message array sent to the model
`output`	The model's response
`inputTokens`	Tokens in the prompt
`outputTokens`	Tokens in the response
`totalTokens`	Total token consumption
`cost`	Dollar cost of the call
`duration`	Latency in seconds
`model`	Provider, model name, and metadata
`status`	`success` or `error`
`tags`	Custom tags for filtering

The Playground (Trace-to-Edit Loop)

This is what separates AGNT from every other observability tool. The playground is not a separate sandbox — it's the prompt editor loaded with the trace's resolved state. Same blocks, same variables, same model config, but editable.

The workflow:

Open a trace in the playground. Creates a session with the trace's state.
Edit blocks. Change wording, reorder content, add or remove blocks.
Update variables. Try different variable values.
Switch models. Test the same prompt on a different model.
Compile and run. Real LLM call with your changes.
Diff. See exactly what changed between the original trace and your edits.
Save. Push your edits back to the prompt's draft.
Publish. Deploy the fix to production.

Every step is an API call. An agent can do this entire loop programmatically — find underperforming traces, open playground sessions, iterate on the prompt, and deploy fixes without human involvement.

Note: The playground API lives in AGNT Studio's namespace. During prompt authoring, the playground is a Studio feature for testing as you build. Post-run, the same playground becomes the trace investigation tool documented here. Same API, different entry point.

Trace Diffing

Compare a trace against the current state of its prompt:

bash

curl "https://studio.agnt.ai/api/v1/tenants/$TENANT_ID/traces/$TRACE_ID/diff" \
  -H "Authorization: Bearer $TOKEN"

This answers: "The prompt has changed since this trace was recorded — what's different?" Useful for understanding whether a regression was caused by a prompt change.

You can also diff within a playground session to see what you've changed before saving:

bash

curl "https://studio.agnt.ai/api/v1/tenants/$TENANT_ID/playground/sessions/$SESSION_ID/diff" \
  -H "Authorization: Bearer $TOKEN"

API Reference

Studio Traces (studio.agnt.ai/api/v1)

Method	Path	Description	Auth
`GET`	`/tenants/:tenantId/traces`	List traces	Management
`POST`	`/tenants/:tenantId/traces`	Ingest a trace	Management
`GET`	`/tenants/:tenantId/traces/:traceId`	Get trace detail	Management
`GET`	`/tenants/:tenantId/traces/:traceId/diff`	Diff trace vs current prompt	Management

POST /tenants/:tenantId/traces

Ingest a trace from the studio-node SDK or directly.

json

{
  "promptName": "customer-support",
  "manifest": {},
  "etag": "abc123",
  "variables": { "company_name": "Acme Corp" },
  "messages": [
    { "role": "user", "content": "I need help with my order" }
  ],
  "output": "I'd be happy to help you with your order...",
  "inputTokens": 150,
  "outputTokens": 89,
  "totalTokens": 239,
  "cost": 0.0012,
  "duration": 1.2,
  "model": {
    "provider": "anthropic",
    "name": "claude-sonnet-4-6",
    "metadata": {}
  },
  "status": "success",
  "metadata": {},
  "tags": ["production", "support"]
}

Studio Playground (studio.agnt.ai/api/v1)

Method	Path	Description	Auth
`POST`	`/tenants/:t/traces/:traceId/playground/sessions`	Create session from trace	Management
`GET`	`/tenants/:t/playground/sessions/:sessionId`	Get session	Management
`PATCH`	`/tenants/:t/playground/sessions/:sessionId/blocks/:blockId`	Edit block	Management
`PATCH`	`/tenants/:t/playground/sessions/:sessionId/variables`	Update variables	Management
`PATCH`	`/tenants/:t/playground/sessions/:sessionId/models`	Update models	Management
`POST`	`/tenants/:t/playground/sessions/:sessionId/compile`	Compile session	Management
`POST`	`/tenants/:t/playground/sessions/:sessionId/run`	Run (real LLM call)	Management
`GET`	`/tenants/:t/playground/sessions/:sessionId/diff`	Diff changes	Management
`POST`	`/tenants/:t/playground/sessions/:sessionId/save`	Save back to draft	Management
`DELETE`	`/tenants/:t/playground/sessions/:sessionId`	Delete session	Management

For Coding Agents

Traces are your feedback loop. If you're a coding agent managing prompts through AGNT Studio, here's the workflow:

The closed loop

GET /traces → find the underperforming call
POST /playground/sessions → open trace in playground
PATCH /blocks → tweak the prompt
POST /run → re-run with the change
POST /save → save edits to draft
POST /publish → deploy to production

This is the entire debug-iterate-deploy cycle, all via API. No human needed. No codebase to modify.

Pattern: Trace-driven debugging

List traces with GET /tenants/:t/traces to find specific failing calls.
Diff the trace against the current prompt with GET /traces/:traceId/diff to see if a prompt change caused the regression.
Open a playground session, iterate on the prompt, and deploy the fix.

For fleet-level regression detection (tracking completion rates over time, spotting degradation trends), use AGNT Analytics to identify the scope first, then drill into individual traces here.

Pattern: Cost optimization

Pull traces for expensive calls (sort by cost or totalTokens).
Open playground sessions and test with cheaper models or shorter prompts.
Compare token counts between the original trace and your playground run.
Save and publish when you find a configuration that maintains quality at lower cost.

What to track

status: "error" traces — these are failed LLM calls. Investigate immediately.
High cost traces — look for prompts that are longer than they need to be.
High duration traces — could indicate model congestion or overly complex prompts.
Trace-to-prompt drift — use the diff endpoint to detect when production traces were generated by an outdated prompt version.

For Product Teams

Quality assurance. Every LLM response your product generates is recorded with its full context. When a customer reports a bad response, you can pull the exact trace, see the exact prompt, and understand exactly what happened.
The playground closes the feedback loop. Product managers can open a trace, see the problematic response, tweak the prompt in the playground, re-run it, and save the fix — without involving engineering. The distance from "this response was bad" to "it's fixed" is minutes, not sprint cycles.
Trace diffing answers "what changed?" When a response quality shifts, diff the trace against the current prompt version. Did someone update the prompt? Did variable values change? The diff tells you.
Operational metrics live in Analytics. For questions like "how many messages did we handle this week?" or "which assistants are most active?", see AGNT Analytics. Traces answer "what happened in this specific call." Analytics answers "how is our fleet performing overall."