Skip to main content
Before and after comparison of raw HTTP traffic versus TOON format token counts
Guide

Token Optimization: Fitting API Traffic into Your AI Agent's Context Window

Raw HTTP traffic is verbose. A single request-response pair can consume thousands of tokens. APXY's output formats compress traffic by 60–90% while keeping the information your agent actually needs to diagnose issues.

APXY Team7 min read

When you paste raw HTTP traffic into a chat with an AI coding agent, you are spending context tokens on noise: verbose headers, binary payloads, internal proxy metadata, repeated boilerplate. A single captured request can easily consume 2,000–5,000 tokens before the agent sees anything useful.

At that rate, a 128K context window fills up fast. You get truncated responses, missed context, and an agent that cannot see the full picture of what is going wrong.

APXY includes three output formats specifically designed to solve this. They let you give your agent more useful information in fewer tokens — and they are available on every traffic output command.

The three formats

| Format | Token reduction | Best for | |---|---|---| | JSON (trimmed) | ~60% | Structured parsing, programmatic agents | | Markdown | ~75% | Readable output in chat interfaces | | TOON | ~90% | Large result sets, heavily constrained contexts |

JSON (trimmed)

The default JSON format applies a set of quiet optimizations before output:

  • Removes internal proxy headers that agents do not need
  • Masks sensitive values (Authorization, Cookie) to prevent leaking secrets into context
  • Handles binary bodies gracefully by replacing them with a size annotation
  • Trims large bodies to a configurable size limit
  • Omits null and empty fields

This is the format to use when an agent will parse the output programmatically — for example, when a LangChain agent calls a tool that returns traffic records.

apxy logs list --format json --limit 10

Markdown

Markdown format outputs traffic as structured tables and fenced code blocks. This is the most readable option for pasting into a chat interface like Cursor, Claude, or ChatGPT:

apxy logs list --format markdown --limit 10

Sample output:

| # | Method | URL                        | Status | Duration |
|---|--------|----------------------------|--------|----------|
| 1 | GET    | /api/users                 | 200    | 45ms     |
| 2 | POST   | /api/auth/login            | 401    | 120ms    |
| 3 | GET    | /api/products?category=top | 200    | 89ms     |

The agent can scan the table, spot the 401, and ask for the detail on that request — without having consumed a token on the other two.

TOON

TOON (Terse One-line Output Notation) is the most aggressive compression format. Each record becomes a single pipe-delimited line:

apxy logs list --format toon --limit 20

Sample output:

1|GET /api/users|200|45ms
2|POST /api/auth/login|401|120ms
3|GET /api/products?category=top|200|89ms
4|DELETE /api/sessions/abc123|204|12ms
5|POST /api/orders|422|230ms

At this density you can fit 50–100 traffic records into the space a single raw request would occupy. TOON is most useful when you want an agent to survey a broad traffic window — "which requests failed in the last hour?" — before drilling into a specific one.

A practical workflow

The best results come from combining formats: use TOON for the overview, Markdown for the detail.

Step 1: Get the overview in TOON

apxy logs list --format toon --limit 50

Paste the output into your agent with a prompt like: "These are the last 50 API calls my app made. Which ones look problematic?"

The agent can scan 50 records in a few hundred tokens, identify the failures, and ask for more detail.

Step 2: Drill into a specific request in Markdown

apxy logs show --id <id> --format markdown

Paste the detail into the follow-up turn. The agent now has the full request and response in a readable format without the raw HTTP noise.

Step 3: Export for reproduction

Once the agent has identified the issue, export the failing request as cURL to reproduce it:

apxy logs export-curl --id <id>

Paste the cURL command into your terminal to confirm the fix works outside your application.

Filtering before you format

Feeding an agent all traffic is wasteful. Use filters to narrow down to the requests that matter before applying format optimization:

# Only failed requests
apxy logs list --status 4xx,5xx --format toon
 
# Only calls to a specific service
apxy logs search --query "api.stripe.com" --format markdown
 
# Last 5 minutes
apxy logs list --since 5m --format json

On masking sensitive values

By default, APXY masks Authorization, Cookie, and other credential headers before including them in output. This means you can safely paste traffic output into an AI chat interface without exposing API keys or session tokens.

If you are working in a secure, controlled environment and need the agent to see the actual values, you can disable masking with --no-mask. Only do this in contexts where the chat history is not stored externally.

Token count estimates by format

Based on a typical REST API response (~500 byte JSON body, 10 standard headers):

| Format | Approximate tokens per record | |---|---| | Raw HTTP | 400–600 | | JSON trimmed | 160–240 | | Markdown | 100–150 | | TOON | 20–40 |

For a context window of 128K tokens and a typical 8K system prompt, TOON lets you fit roughly 3,000 traffic records in a single context. Markdown fits around 700. Raw HTTP fits around 200.

The format you choose determines how much history your agent can reason over in one turn.

For more on how AI agents can use captured traffic, see Why Your AI Coding Agent Needs Network Visibility and How to Capture HTTPS Traffic from Cursor and AI Agents.

token-optimizationai-agentsguidecontext-windowdeveloper-tools

Debug your APIs with APXY

Capture, inspect, mock, and replay HTTP/HTTPS traffic. Free to install.

Install Free

Related articles