
Token Optimization: Fitting API Traffic into Your AI Agent's Context Window
Raw HTTP traffic is verbose. A single request-response pair can consume thousands of tokens. APXY's output formats compress traffic by 60–90% while keeping the information your agent actually needs to diagnose issues.
When you paste raw HTTP traffic into a chat with an AI coding agent, you are spending context tokens on noise: verbose headers, binary payloads, internal proxy metadata, repeated boilerplate. A single captured request can easily consume 2,000–5,000 tokens before the agent sees anything useful.
At that rate, a 128K context window fills up fast. You get truncated responses, missed context, and an agent that cannot see the full picture of what is going wrong.
APXY includes three output formats specifically designed to solve this. They let you give your agent more useful information in fewer tokens — and they are available on every traffic output command.
The three formats
| Format | Token reduction | Best for | |---|---|---| | JSON (trimmed) | ~60% | Structured parsing, programmatic agents | | Markdown | ~75% | Readable output in chat interfaces | | TOON | ~90% | Large result sets, heavily constrained contexts |
JSON (trimmed)
The default JSON format applies a set of quiet optimizations before output:
- Removes internal proxy headers that agents do not need
- Masks sensitive values (
Authorization,Cookie) to prevent leaking secrets into context - Handles binary bodies gracefully by replacing them with a size annotation
- Trims large bodies to a configurable size limit
- Omits null and empty fields
This is the format to use when an agent will parse the output programmatically — for example, when a LangChain agent calls a tool that returns traffic records.
apxy logs list --format json --limit 10Markdown
Markdown format outputs traffic as structured tables and fenced code blocks. This is the most readable option for pasting into a chat interface like Cursor, Claude, or ChatGPT:
apxy logs list --format markdown --limit 10Sample output:
| # | Method | URL | Status | Duration |
|---|--------|----------------------------|--------|----------|
| 1 | GET | /api/users | 200 | 45ms |
| 2 | POST | /api/auth/login | 401 | 120ms |
| 3 | GET | /api/products?category=top | 200 | 89ms |
The agent can scan the table, spot the 401, and ask for the detail on that request — without having consumed a token on the other two.
TOON
TOON (Terse One-line Output Notation) is the most aggressive compression format. Each record becomes a single pipe-delimited line:
apxy logs list --format toon --limit 20Sample output:
1|GET /api/users|200|45ms
2|POST /api/auth/login|401|120ms
3|GET /api/products?category=top|200|89ms
4|DELETE /api/sessions/abc123|204|12ms
5|POST /api/orders|422|230ms
At this density you can fit 50–100 traffic records into the space a single raw request would occupy. TOON is most useful when you want an agent to survey a broad traffic window — "which requests failed in the last hour?" — before drilling into a specific one.
A practical workflow
The best results come from combining formats: use TOON for the overview, Markdown for the detail.
Step 1: Get the overview in TOON
apxy logs list --format toon --limit 50Paste the output into your agent with a prompt like: "These are the last 50 API calls my app made. Which ones look problematic?"
The agent can scan 50 records in a few hundred tokens, identify the failures, and ask for more detail.
Step 2: Drill into a specific request in Markdown
apxy logs show --id <id> --format markdownPaste the detail into the follow-up turn. The agent now has the full request and response in a readable format without the raw HTTP noise.
Step 3: Export for reproduction
Once the agent has identified the issue, export the failing request as cURL to reproduce it:
apxy logs export-curl --id <id>Paste the cURL command into your terminal to confirm the fix works outside your application.
Filtering before you format
Feeding an agent all traffic is wasteful. Use filters to narrow down to the requests that matter before applying format optimization:
# Only failed requests
apxy logs list --status 4xx,5xx --format toon
# Only calls to a specific service
apxy logs search --query "api.stripe.com" --format markdown
# Last 5 minutes
apxy logs list --since 5m --format jsonOn masking sensitive values
By default, APXY masks Authorization, Cookie, and other credential headers before including them in output. This means you can safely paste traffic output into an AI chat interface without exposing API keys or session tokens.
If you are working in a secure, controlled environment and need the agent to see the actual values, you can disable masking with --no-mask. Only do this in contexts where the chat history is not stored externally.
Token count estimates by format
Based on a typical REST API response (~500 byte JSON body, 10 standard headers):
| Format | Approximate tokens per record | |---|---| | Raw HTTP | 400–600 | | JSON trimmed | 160–240 | | Markdown | 100–150 | | TOON | 20–40 |
For a context window of 128K tokens and a typical 8K system prompt, TOON lets you fit roughly 3,000 traffic records in a single context. Markdown fits around 700. Raw HTTP fits around 200.
The format you choose determines how much history your agent can reason over in one turn.
For more on how AI agents can use captured traffic, see Why Your AI Coding Agent Needs Network Visibility and How to Capture HTTPS Traffic from Cursor and AI Agents.
Debug your APIs with APXY
Capture, inspect, mock, and replay HTTP/HTTPS traffic. Free to install.
Install FreeRelated articles
Why Your AI Coding Agent Needs Network Visibility
AI coding agents are excellent at reading code. They cannot see the network. That gap is where most agent-assisted debugging sessions get stuck. Here is how to close it.
InsightWhy Local-First API Tools Are Winning
A wave of developers is moving away from cloud-hosted API tools. Pricing changes, data sovereignty concerns, and the rise of CLI-native workflows are driving a shift toward tools that live on your machine and sync through Git.