Unified Search Agent

<< Back to templates

Unified Search Agent

Visit Repo Open in sandbox

Stack

LangGraph

Gemini 2.0 Flash

Bright Data MCP

Pydantic

LangGraph Studio

🚀 Features

The core logic defined in src/agent/graph.py orchestrates a sophisticated search workflow that:

🧠 Intent Classification: Uses Gemini 2.0 Flash to classify queries into four categories:
- general_search: News, facts, definitions, explanations
- product_search: Shopping, prices, reviews, recommendations
- web_scraping: Data extraction from specific websites
- comparison: Comparing multiple items or services
🔍 Multi-Modal Search:
- Google Search: Via Bright Data’s MCP search engine for general queries
- Web Scraping: Using Bright Data’s Web Unlocker for targeted data extraction
- Smart Routing: Automatically chooses the best search strategy based on intent
📊 Result Processing:
- Sanitizes and deduplicates results
- Scores results on relevance and quality
- Returns configurable top N results with confidence scores
- Provides query summaries
🛡️ Error Handling: Graceful fallbacks and comprehensive error management

🏗️ Architecture

The agent follows a sophisticated graph-based workflow:

START → Intent Classifier → [Google Search | Web Unlocker] → Final Processing → END

Routing Logic:

URLs in query → Direct to Web Unlocker
general_search → Google Search only
product_search → Google Search then Web Scraping
web_scraping → Web Unlocker only
comparison → Both search methods in parallel

Tech Stack

LangGraph
Gemini 2.0 Flash
Bright Data MCP
Pydantic
LangGraph Studio

Getting Started

<!–
Setup instruction auto-generated by langgraph template lock. DO NOT EDIT MANUALLY.
–>
<!–
End setup instructions
–>

Install dependencies along with the LangGraph CLI:

cd unified-search-agent
pip install -e . "langgraph-cli[inmem]"

Set up environment variables. Create a .env file with your API keys:

cp .env.example .env

Add your API keys to the .env file:

# Required
GOOGLE_API_KEY=your_gemini_api_key_here
BRIGHT_DATA_API_TOKEN=your_bright_data_token_here

# Optional zones (defaults provided)
WEB_UNLOCKER_ZONE=unblocker
BROWSER_ZONE=scraping_browser

# Optional - for LangSmith tracing
LANGSMITH_API_KEY=lsv2...

Start the LangGraph Server:

langgraph dev

Open LangGraph Studio at the URL provided (typically https://smith.langchain.com/studio/?baseUrl=http://127.0.0.1:2024)

For more information on getting started with LangGraph Server, see here.

📝 Usage Examples

Basic Search

{
  "query": "Who is Or Lenchner",
  "max_results": 3
}

Product Search

{
  "query": "best laptops under $1000",
  "max_results": 5
}

Web Scraping

{
  "query": "extract contact info from https://example.com",
  "max_results": 10
}

Comparison Query

{
  "query": "iPhone 15 vs Samsung Galaxy S24 comparison",
  "max_results": 5
}

🎛️ Configuration

The agent supports several configurable parameters:

max_results: Number of final results to return (default: 5)
Query-specific routing: URLs in queries automatically trigger web scraping
Search strategies: Automatically determined by intent classification

How to Customize

Modify Intent Classification: Update the categories and examples in intent_classifier_node() in src/agent/nodes.py
Adjust Search Strategies: Modify the routing logic in src/agent/graph.py to change how different intents are handled
Customize Result Scoring: Update the scoring criteria in final_processing_node() to change how results are ranked
Add New Search Sources: Extend the graph with additional search nodes for other data sources
Configure Parameters: Modify the Configuration class in graph.py to expose additional runtime parameters

🛠️ Development

While iterating on your graph in LangGraph Studio, you can:

Edit past state and rerun from previous states to debug specific nodes
Hot reload – local changes are automatically applied
Create new threads using the + button to clear previous history
Visual debugging – see the exact flow and state at each step

The graph structure allows for easy debugging of:

Intent classification accuracy
Search result quality
Routing decisions
Final result scoring

📊 Result Format

The agent returns structured results with comprehensive scoring:

{
  "final_results": [
    {
      "title": "Result Title",
      "url": "https://example.com",
      "snippet": "Relevant description...",
      "source": "google_search",
      "relevance_score": 0.95,
      "quality_score": 0.88,
      "final_score": 0.92,
      "metadata": {
        "search_engine": "google",
        "via": "bright_data_mcp",
        "query": "original query"
      }
    }
  ],
  "query_summary": "Found information about...",
  "total_processed": 8,
  "intent": "general_search",
  "intent_confidence": 0.95
}

🔧 Advanced Features

Parallel Processing: Comparison queries execute both search methods simultaneously
Intelligent Fallbacks: Graceful error handling with default responses
Duplicate Detection: Automatic deduplication of results across sources
URL Validation: Filters out invalid or empty URLs
Content Sanitization: Cleans and validates all text content

For more advanced features and examples, refer to the LangGraph documentation.

LangGraph Studio integrates with LangSmith for in-depth tracing and team collaboration, allowing you to analyze and optimize your search agent’s performance.

📋 Dependencies

langgraph>=0.2.6: Core orchestration framework
langchain-google-genai: Gemini integration for LLM operations
pydantic>=2.0.0: Data validation and parsing
mcp-use: MCP client for Bright Data integration
langchain-core: Core LangChain utilities
python-dotenv>=1.0.1: Environment variable management

🤝 Contributing

Fork the repository
Create a feature branch
Make your changes
Test with LangGraph Studio
Submit a pull request

📄 License

This project is licensed under the MIT License – see the LICENSE file for details.

<!–
Configuration auto-generated by langgraph template lock. DO NOT EDIT MANUALLY.
{
“config_schemas”: {
“agent”: {
“type”: “object”,
“properties”: {
“max_results”: {
“type”: “integer”,
“description”: “Maximum number of final results to return”,
“default”: 5
}
}
}
}
}
–>