For Coding Agents
AGNT Traces
Every LLM call your agents make through AGNT Studio produces a trace — the exact prompt that was sent, the exact response that came back, tokens consumed, cost incurred, latency measured, and the full compilation context (variables, conditions, model config). AGNT Traces captures all of it and gives you a built-in edit-and-replay loop to fix problems on the spot.
For fleet-level operational metrics (message volumes, task completion rates, active users, assistant performance), see AGNT Analytics.
Why AGNT Traces
Observability tools show you what happened. AGNT Traces let you fix it.
Most LLM observability platforms (LangSmith, Langfuse, Helicone) give you a read-only trace viewer. You can see the prompt, the response, the tokens, the cost. Great. Now what? You copy the prompt into a playground, tweak it, re-run it manually, copy the changes back to your codebase, open a PR, get it reviewed, deploy it, and hope it works.
AGNT Studio hosts the prompts. So when you open a trace in the playground, edit a block, and re-run it — you're editing the actual prompt. Save your changes and they go directly to the draft. Publish and they're live. The distance from "this response was bad" to "it's fixed in production" is three API calls, not three days.
Quick Start
List recent traces
curl "https://studio.agnt.ai/api/v1/tenants/$TENANT_ID/traces" \
-H "Authorization: Bearer $TOKEN"Open a trace in the playground, edit, and save
# 1. Create a playground session from a trace
curl -X POST "https://studio.agnt.ai/api/v1/tenants/$TENANT_ID/traces/$TRACE_ID/playground/sessions" \
-H "Authorization: Bearer $TOKEN"
# 2. Edit a block in the session
curl -X PATCH "https://studio.agnt.ai/api/v1/tenants/$TENANT_ID/playground/sessions/$SESSION_ID/blocks/$BLOCK_ID" \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{"content": "Updated instruction text with better wording."}'
# 3. Re-run to test the change (real LLM call)
curl -X POST "https://studio.agnt.ai/api/v1/tenants/$TENANT_ID/playground/sessions/$SESSION_ID/run" \
-H "Authorization: Bearer $TOKEN"
# 4. Save changes back to the prompt draft
curl -X POST "https://studio.agnt.ai/api/v1/tenants/$TENANT_ID/playground/sessions/$SESSION_ID/save" \
-H "Authorization: Bearer $TOKEN"That's the closed loop. Trace to fix in four calls.
Core Concepts
Studio Traces
Studio Traces capture the full context of every LLM call made through AGNT Studio-managed prompts: the compiled prompt (after variable resolution and condition evaluation), the model response, token counts, cost, latency, and status.
Every trace records:
| Field | What it captures |
|---|---|
promptName | Which prompt was compiled |
manifest | The full compiled manifest (system message, tools, model config) |
etag | Version fingerprint of the prompt at call time |
variables | Variable values used for compilation |
messages | The message array sent to the model |
output | The model's response |
inputTokens | Tokens in the prompt |
outputTokens | Tokens in the response |
totalTokens | Total token consumption |
cost | Dollar cost of the call |
duration | Latency in seconds |
model | Provider, model name, and metadata |
status | success or error |
tags | Custom tags for filtering |
The Playground (Trace-to-Edit Loop)
This is what separates AGNT from every other observability tool. The playground is not a separate sandbox — it's the prompt editor loaded with the trace's resolved state. Same blocks, same variables, same model config, but editable.
The workflow:
- Open a trace in the playground. Creates a session with the trace's state.
- Edit blocks. Change wording, reorder content, add or remove blocks.
- Update variables. Try different variable values.
- Switch models. Test the same prompt on a different model.
- Compile and run. Real LLM call with your changes.
- Diff. See exactly what changed between the original trace and your edits.
- Save. Push your edits back to the prompt's draft.
- Publish. Deploy the fix to production.
Every step is an API call. An agent can do this entire loop programmatically — find underperforming traces, open playground sessions, iterate on the prompt, and deploy fixes without human involvement.
Note: The playground API lives in AGNT Studio's namespace. During prompt authoring, the playground is a Studio feature for testing as you build. Post-run, the same playground becomes the trace investigation tool documented here. Same API, different entry point.
Trace Diffing
Compare a trace against the current state of its prompt:
curl "https://studio.agnt.ai/api/v1/tenants/$TENANT_ID/traces/$TRACE_ID/diff" \
-H "Authorization: Bearer $TOKEN"This answers: "The prompt has changed since this trace was recorded — what's different?" Useful for understanding whether a regression was caused by a prompt change.
You can also diff within a playground session to see what you've changed before saving:
curl "https://studio.agnt.ai/api/v1/tenants/$TENANT_ID/playground/sessions/$SESSION_ID/diff" \
-H "Authorization: Bearer $TOKEN"API Reference
Studio Traces (studio.agnt.ai/api/v1)
| Method | Path | Description | Auth |
|---|---|---|---|
GET | /tenants/:tenantId/traces | List traces | Management |
POST | /tenants/:tenantId/traces | Ingest a trace | Management |
GET | /tenants/:tenantId/traces/:traceId | Get trace detail | Management |
GET | /tenants/:tenantId/traces/:traceId/diff | Diff trace vs current prompt | Management |
POST /tenants/:tenantId/traces
Ingest a trace from the studio-node SDK or directly.
{
"promptName": "customer-support",
"manifest": {},
"etag": "abc123",
"variables": { "company_name": "Acme Corp" },
"messages": [
{ "role": "user", "content": "I need help with my order" }
],
"output": "I'd be happy to help you with your order...",
"inputTokens": 150,
"outputTokens": 89,
"totalTokens": 239,
"cost": 0.0012,
"duration": 1.2,
"model": {
"provider": "anthropic",
"name": "claude-sonnet-4-6",
"metadata": {}
},
"status": "success",
"metadata": {},
"tags": ["production", "support"]
}Studio Playground (studio.agnt.ai/api/v1)
| Method | Path | Description | Auth |
|---|---|---|---|
POST | /tenants/:t/traces/:traceId/playground/sessions | Create session from trace | Management |
GET | /tenants/:t/playground/sessions/:sessionId | Get session | Management |
PATCH | /tenants/:t/playground/sessions/:sessionId/blocks/:blockId | Edit block | Management |
PATCH | /tenants/:t/playground/sessions/:sessionId/variables | Update variables | Management |
PATCH | /tenants/:t/playground/sessions/:sessionId/models | Update models | Management |
POST | /tenants/:t/playground/sessions/:sessionId/compile | Compile session | Management |
POST | /tenants/:t/playground/sessions/:sessionId/run | Run (real LLM call) | Management |
GET | /tenants/:t/playground/sessions/:sessionId/diff | Diff changes | Management |
POST | /tenants/:t/playground/sessions/:sessionId/save | Save back to draft | Management |
DELETE | /tenants/:t/playground/sessions/:sessionId | Delete session | Management |
For Coding Agents
Traces are your feedback loop. If you're a coding agent managing prompts through AGNT Studio, here's the workflow:
The closed loop
GET /traces → find the underperforming call
POST /playground/sessions → open trace in playground
PATCH /blocks → tweak the prompt
POST /run → re-run with the change
POST /save → save edits to draft
POST /publish → deploy to production
This is the entire debug-iterate-deploy cycle, all via API. No human needed. No codebase to modify.
Pattern: Trace-driven debugging
- List traces with
GET /tenants/:t/tracesto find specific failing calls. - Diff the trace against the current prompt with
GET /traces/:traceId/diffto see if a prompt change caused the regression. - Open a playground session, iterate on the prompt, and deploy the fix.
For fleet-level regression detection (tracking completion rates over time, spotting degradation trends), use AGNT Analytics to identify the scope first, then drill into individual traces here.
Pattern: Cost optimization
- Pull traces for expensive calls (sort by
costortotalTokens). - Open playground sessions and test with cheaper models or shorter prompts.
- Compare token counts between the original trace and your playground run.
- Save and publish when you find a configuration that maintains quality at lower cost.
What to track
status: "error"traces — these are failed LLM calls. Investigate immediately.- High
costtraces — look for prompts that are longer than they need to be. - High
durationtraces — could indicate model congestion or overly complex prompts. - Trace-to-prompt drift — use the diff endpoint to detect when production traces were generated by an outdated prompt version.
For Product Teams
- Quality assurance. Every LLM response your product generates is recorded with its full context. When a customer reports a bad response, you can pull the exact trace, see the exact prompt, and understand exactly what happened.
- The playground closes the feedback loop. Product managers can open a trace, see the problematic response, tweak the prompt in the playground, re-run it, and save the fix — without involving engineering. The distance from "this response was bad" to "it's fixed" is minutes, not sprint cycles.
- Trace diffing answers "what changed?" When a response quality shifts, diff the trace against the current prompt version. Did someone update the prompt? Did variable values change? The diff tells you.
- Operational metrics live in Analytics. For questions like "how many messages did we handle this week?" or "which assistants are most active?", see AGNT Analytics. Traces answer "what happened in this specific call." Analytics answers "how is our fleet performing overall."