Token Optimization – Getting More from Less
Yesterday we showed you how to build custom agents with surgical tool selection. Today, we’re diving deeper: Token Optimization.
Selecting the right tools is only half the battle. The real game-changer is optimizing what those tools return. We’ve re-architected our data pipelines to deliver maximum accuracy while using 40-80% fewer tokens on outputs.
Here’s how we did it.
The Problem: Data Bloat
When you call a tool like scrape_as_markdown or search_engine, the API returns rich data. But here’s the catch: most of that data is formatted for humans, not LLMs.
Traditional APIs include unnecessary overhead:
- Redundant formatting (bold, italic, headings) that LLMs don’t need
- Ads and sponsored content mixed with organic results
- Image metadata and visual elements that waste tokens
- Inconsistent field naming and redundant metadata
For a typical web page scrape or search query, you’re often getting 3-5x more data than the LLM actually needs for reasoning.
The Solution: Two-Layer Token Optimization
We’ve implemented a layered optimization strategy that targets different types of data:
- Remark + Strip-Markdown for web page content (
scrape_as_markdown) - Parsed Light + Payload Cleaning for search engine results (
search_engine)
Let’s break down each layer.
But Wait-Why Not TOON?
You might be wondering: what about TOON (Token-Oriented Object Notation)? We initially explored it as a third optimization layer for structured datasets like LinkedIn profiles and Amazon products.
TOON is a clever format that uses indentation and tabular layouts to reduce tokens. On paper, it delivers 30-60% savings for uniform arrays of identical objects. But when we tested it on real-world API responses from Bright Data, we discovered something important:
The delimiter isn’t the bottleneck-the data itself is.
The Delimiter Illusion
Looking at a typical LinkedIn profile response, most tokens come from:
- Long text fields (
about,recommendations,activity[].title) - Long URLs (
avatar,banner_image,activity[].link,credential_url)
The delimiter (\n, |, \t) is a tiny fraction of the total token count.
Newline (\n) is already:
- A single, very common token in all major LLM tokenizers
- Naturally aligned with how models chunk text (line-oriented)
- Doesn’t appear inside URLs or most text, avoiding escaping issues
Exotic separators like |, ^, or \x1F might reduce quoting in a few spots, but they often introduce rare multi-token sequences that cancel out any gains.
Short answer: If you only tweak the delimiter, \n is already about as good as it gets for this kind of data.
Where TOON Falls Short
TOON shines for uniform arrays of identical objects—think 1,000 employee records with the same schema. But real-world web data from tools like web_data_linkedin_person_profile or web_data_amazon_product is:
- Heterogeneous — Nested objects with different schemas (
experience,education,activityarrays) - Non-uniform — Mixed array types (some entries have
img, others don’t) - Single-object responses — Most API calls return 1 profile or 1 product, not 1,000
For deeply nested or non-uniform structures, minified JSON often uses fewer tokens than TOON. The TOON spec itself admits this—TOON can actually use more tokens than compact JSON for single objects with deep nesting.
The Real Lever: Change What You Send, Not How You Format It
Here’s the insight that matters: Any format-level optimization (JSON vs TOON vs YAML) is dwarfed by simply changing what data you send.
We don’t do all of that—our tools return the full data from Bright Data’s APIs. But we do strip null values, which appear frequently in web scraping responses and waste tokens without adding information.
The point is: delimiter tweaks save ~5-10% at best. Content filtering saves 20-80%. TOON optimizes the wrong variable for real-world web data.
Tooling Immaturity
TOON is also brand new—the first commit to the spec was November 2nd, 2024. It’s literally a month old. JSON has validators, editors, and libraries in every language. TOON requires custom parsing and lacks ecosystem support.
One engineer put it well: “First time I saw TOON, it looked like someone’s half-finished scratchpad. Show it to your backend engineer, and there’s a chance they’ll frown like you brought them a new problem.”
Our Decision
After benchmarking TOON on real Bright Data payloads (LinkedIn profiles, Amazon products, Google SERPs), we concluded:
- For search results: Bright Data’s Parsed Light format (see Layer 2 below) delivers 80% token reduction by filtering at the API level—no custom encoding needed.
- For web scraping: Strip-markdown reduces tokens by 40% while keeping responses human-readable—no new format required.
- For structured datasets: The real wins come from dropping fields and truncating text, not from replacing JSON with TOON.
TOON is a brilliant idea for the right use case (massive uniform datasets). But for heterogeneous web API responses, standard optimizations beat exotic formats every time.
Layer 1: Remark + Strip-Markdown for Web Scraping
The Challenge: Markdown Bloat
Our scrape_as_markdown tool converts any web page into clean, LLM-friendly markdown. But raw markdown converters often include:
- Redundant formatting (bold, italic, headings) that LLMs don’t need for reasoning
- Image alt-text and metadata
- Empty lines and spacing inconsistencies
For a typical blog post or product page, raw markdown can be 3-5x longer than the core content.
The Solution: Strip-Markdown
We use remark + strip-markdown to intelligently reduce markdown to plain text while preserving structure:
We’re grateful to the remark project for their excellent markdown processing library. Consider supporting their work!
import {remark} from 'remark';
import strip from 'strip-markdown';
// Inside scrape_as_markdown tool
const minified_data = await remark()
.use(strip)
.process(response.data);
return minified_data.value;
What Gets Stripped?
The strip-markdown plugin removes:
- Bold/Italic —
**Important**becomesImportant - Image syntax —
becomesalt text(if needed) or empty - Headings —
### Section TitlebecomesSection Title(preserves text, drops markup) - Code blocks — Reduces backticks and formatting while keeping content
The result? Plain text that retains the semantic meaning but drops all the formatting overhead.
Example: Before and After
Raw Markdown (from Web Unlocker):
# Product Reviews
## Customer Feedback
- **John D.** - ⭐⭐⭐⭐⭐
*"Great product! Highly recommend."*
[Read more](https://example.com/review/123)
- **Sarah M.** - ⭐⭐⭐⭐
*"Good value for money."*
[Read more](https://example.com/review/456)

[Buy Now](https://example.com/buy)
After remark().use(strip).process():
Product Reviews
Customer Feedback
John D. - ⭐⭐⭐⭐⭐
"Great product! Highly recommend."
Read more
Sarah M. - ⭐⭐⭐⭐
"Good value for money."
Read more
Product Image
Buy Now
Token reduction: ~40% for a full page.
The LLM still gets all the review text, ratings, and call-to-action, but without the link URLs, image paths, or markdown formatting syntax.
When to Use Stripped Markdown
This optimization is perfect for:
- Summarization tasks — “Summarize this blog post”
- Sentiment analysis — “What do customers think about this product?”
- Entity extraction — “Extract company names and contact info from this page”
If your agent needs to click links or navigate the page, use our Scraping Browser tools instead (scraping_browser_navigate, scraping_browser_snapshot).
Layer 2: Parsed Light – Engineered for AI Agents
The Problem: Traditional SERP APIs Weren’t Built for LLMs
Traditional search engine result page (SERP) APIs were designed for humans browsing web interfaces. They return everything:
- Ads and sponsored content your agent doesn’t need
- Knowledge panels and featured snippets that bloat responses
- Redundant metadata fields across multiple naming conventions
- Visual elements (thumbnails, favicons) that waste tokens
- Related searches, autocomplete suggestions, and “people also ask” sections
The result? A single search for 10 results can return 2,000-3,000 tokens of JSON, when your LLM agent only needs link + title + description.
For AI agents running multi-step research workflows, this is a dealbreaker. Every extra token compounds across the context window, limiting how many queries you can run before hitting limits.
The Solution: Bright Data’s Parsed Light Format
We’ve introduced Parsed Light API response format—purpose-built for AI agents that need speed and efficiency.
Here’s what makes it different:
- Top 10 organic results only — No ads, no knowledge panels, no sidebar clutter
- Consistent field structure — Every result has
link,title,description, and optionalglobal_rank - Clean by design — Pre-optimized at the API level, so you don’t need complex post-processing
- Faster response times — Smaller payloads = faster network transfer and parsing
Instead of wrestling with inconsistent field names and bloated responses, Parsed Light delivers exactly what AI agents need: actionable search results in minimal tokens.
Parsed Light in Action
When you call our search_engine tool with Google as the engine, we automatically request Bright Data’s parsed_light format:
// Inside search_engine tool (for Google)
const response = await axios({
url: 'https://api.brightdata.com/request',
method: 'POST',
data: {
url: search_url('google', query, cursor),
zone: ctx.unlocker_zone,
format: 'raw',
data_format: 'parsed_light', // ← The magic parameter
},
headers: api_headers(ctx.api_token, ctx.client_name, 'search_engine'),
responseType: 'text',
});
What You Get: Clean, Predictable JSON
Here’s an actual Parsed Light response for a search query:
{
"organic": [
{
"link": "https://example.com/pizza",
"title": "Best Pizza in NYC - Joe's Pizza",
"description": "Family-owned pizzeria serving authentic New York slices since 1975...",
"global_rank": 1
},
{
"link": "https://example.com/pizza-guide",
"title": "Top 10 Pizza Places in NYC",
"description": "Discover the highest-rated pizza restaurants across all five boroughs...",
"global_rank": 2,
"extensions": [
{
"type": "site_link",
"link": "https://example.com/pizza-guide/brooklyn",
"text": "Brooklyn"
}
]
}
// ... 8 more results
]
}
Notice what’s not there:
- No ads or sponsored listings
- No knowledge graph panels
- No “people also ask” sections
- No redundant metadata fields
- No unicode control characters or formatting noise
Just 10 clean, ranked search results ready for your LLM to process.
Additional Cleanup: The Final Polish
Even with Parsed Light doing the heavy lifting, we apply a lightweight post-processing step to ensure perfect consistency:
function clean_google_search_payload(raw_data){
const data = raw_data && typeof raw_data=='object' ? raw_data : {};
const organic = Array.isArray(data.organic) ? data.organic : [];
const pagination = data.pagination && typeof data.pagination=='object'
? data.pagination : {};
// Normalize to just link, title, description
const organic_clean = organic
.map(entry=>{
if (!entry || typeof entry!='object')
return null;
const link = typeof entry.link=='string' ? entry.link.trim() : '';
const title = typeof entry.title=='string'
? entry.title.trim() : '';
const description = typeof entry.description=='string'
? entry.description.trim() : '';
if (!link || !title)
return null; // Skip invalid entries
return {link, title, description};
})
.filter(Boolean);
const parsed_page = Number(pagination.current_page);
const current_page = Number.isFinite(parsed_page) && parsed_page>0
? parsed_page : 1;
return {organic: organic_clean, current_page};
}
This final cleanup:
- Validates data types — Ensures
link,title, anddescriptionare strings - Trims whitespace — Removes any leading/trailing spaces
- Filters invalid entries — Skips results missing required fields
- Normalizes pagination — Converts
current_pageto a consistent number format - Strips optional fields — Removes
global_rankandextensionsto keep responses ultra-minimal
The result is bulletproof JSON that your LLM can parse with zero edge cases.
Example: Traditional vs. Parsed Light
Traditional SERP API (before Parsed Light):
{
"ads": [...],
"organic": [
{
"link": "https://example.com/product",
"url": "https://example.com/product",
"cache": {"url": "https://webcache.google.com/..."},
"title": "Amazing\u2003Product\u2003\u2003Review",
"heading": "Amazing Product Review",
"name": "Product Review",
"description": "This is a great product...",
"snippet": "This is a great product...",
"snippet_long": "This is a great product with many features...",
"subtitle": "Product features",
"rating": 4.5,
"price": "$49.99",
"image": "https://cdn.example.com/image.jpg",
"favicon": "https://example.com/favicon.ico"
}
// ... 30+ more results including ads, knowledge panels, etc.
],
"knowledge_graph": {...},
"people_also_ask": [...],
"related_searches": [...],
"pagination": {...}
}
~2,500 tokens for a typical response.
Parsed Light (optimized for AI agents):
{
"organic": [
{
"link": "https://example.com/product",
"title": "Amazing Product Review",
"description": "This is a great product...",
"global_rank": 1
}
// ... 9 more results (top 10 only)
]
}
~600 tokens for the same query.
After clean_google_search_payload():
{
"organic": [
{
"link": "https://example.com/product",
"title": "Amazing Product Review",
"description": "This is a great product..."
}
],
"current_page": 1
}
~500 tokens — an 80% reduction from traditional SERP APIs.
Why Parsed Light Outperforms Traditional Parsers
Most SERP APIs parse the entire page and leave you to clean up the mess. Parsed Light is different:
- Pre-filtered at the source — Only extracts organic results, no ads or sidebars
- Standardized schema — Consistent field names across all queries (no
snippetvs.descriptionvs.snippet_long) - LLM-first design — Built for token efficiency from day one, not as an afterthought
- Sub-1-second response times — Parsed Light is served via Bright Data’s premium routing infrastructure, designed specifically for mission-critical AI applications
This isn’t just about saving tokens—it’s about rethinking how SERP data should work for AI agents.
Built for Real-Time AI Agents
Bright Data’s Parsed Light isn’t just optimized—it’s engineered for speed. With sub-1-second response times, it’s ideal for:
- Real-time data enrichment — Agents performing live lookups during user conversations
- Multi-step research workflows — Chain multiple queries without latency bottlenecks
- Fact verification — Instant validation of claims and statements
- User-facing applications — Search-powered features that feel instant
Traditional SERP APIs can take 3-5 seconds per query. At scale, that latency compounds. Parsed Light delivers results in under 1 second, keeping your agents responsive and your users engaged.
Combined Impact: Real-World Workflow
Let’s trace token usage through a realistic agent workflow:
Task: “Find articles about AI regulations, then summarize the key points from each source.”
Step 1: Search for Articles
Agent calls: search_engine({query: "AI regulations 2024"})
Without optimization (traditional SERP API): ~2,500 tokens (10 results + ads + knowledge panels)
With Parsed Light + cleanup: ~500 tokens
Savings: 80% (2,000 tokens saved)
Step 2: Scrape Article Pages
Agent calls: scrape_as_markdown({url: "https://example.com/article"}) × 5 articles
Without optimization: ~15,000 tokens (5 pages × 3,000 tokens/page)
With remark().use(strip): ~9,000 tokens
Savings: 40% (6,000 tokens saved)
Step 3: Additional Research
Agent calls: search_engine({query: "EU AI Act details"}) for follow-up research
Without optimization: ~2,500 tokens
With Parsed Light + cleanup: ~500 tokens
Savings: 80% (2,000 tokens saved)
Total Workflow Savings
Without optimization: 20,000 tokens
With optimization: 10,000 tokens
Overall reduction: 50% (10,000 tokens saved)
At $3 per million input tokens (Claude Sonnet pricing), that’s $0.030 saved per workflow. Run this 1,000 times a day, and you’re saving $30/day or $10,950/year.
But the real value isn’t just cost savings—it’s throughput. With these optimizations, your agents can run more complex workflows in the same context window, completing tasks faster and handling more sophisticated queries.
Why This Matters for Agentic Workflows
Token optimization isn’t just about cost. It’s about enabling more complex workflows within context windows.
With a 200K token context window:
- Without optimization: You can process ~10 multi-step workflows before hitting the limit
- With optimization: You can process ~20 workflows in the same window
That’s 100% more throughput from the same infrastructure.
And when you combine this with Day 1’s Tool Groups (60-95% reduction in system prompt tokens) and Day 2’s Custom Tools (surgical tool selection), you’re looking at massive total token reduction across the entire agent lifecycle (system prompt + tool calls + tool responses).
Technical Details: Package Dependencies
Both optimization layers are implemented using battle-tested open-source libraries:
remark— Markdown processor (used by MDX, Gatsby, Next.js)strip-markdown— Remark plugin for stripping formatting
These are the same tools used by production sites processing millions of requests per day.
See the Difference
Want to measure the impact? Compare token counts:
- Call a
search_enginetool and count tokens in the response - Compare against a traditional SERP API response for the same query
- Use your LLM provider’s tokenizer (e.g.,
tiktokenfor OpenAI/Claude)
You’ll see 80% reduction on Google searches, 40% on scraped pages, and 50% on structured datasets.
This isn’t just optimization—it’s a complete rethinking of how web data should be delivered to AI agents.
Performance Stats Summary
| Optimization | Tool(s) Affected | Token Reduction | Use Case |
|---|---|---|---|
| Strip-Markdown | scrape_as_markdown |
~40% | Web page summaries, content extraction |
| Parsed Light | search_engine (Google only) |
~80% | Search result parsing, lead generation, research workflows |
What’s Next?
Tomorrow (Day 4), we’re releasing enterprise integrations that bring our MCP server to the platforms your teams already use.
Stay tuned.