Unified Search Agent

Stack

LangGraph
Gemini 2.0 Flash
Bright Data MCP
Pydantic
LangGraph Studio

Share

🚀 Features

The core logic defined in src/agent/graph.py orchestrates a sophisticated search workflow that:

  • 🧠 Intent Classification: Uses Gemini 2.0 Flash to classify queries into four categories:

    • general_search: News, facts, definitions, explanations
    • product_search: Shopping, prices, reviews, recommendations
    • web_scraping: Data extraction from specific websites
    • comparison: Comparing multiple items or services
  • 🔍 Multi-Modal Search:

    • Google Search: Via Bright Data’s MCP search engine for general queries
    • Web Scraping: Using Bright Data’s Web Unlocker for targeted data extraction
    • Smart Routing: Automatically chooses the best search strategy based on intent
  • 📊 Result Processing:

    • Sanitizes and deduplicates results
    • Scores results on relevance and quality
    • Returns configurable top N results with confidence scores
    • Provides query summaries
  • 🛡️ Error Handling: Graceful fallbacks and comprehensive error management

🏗️ Architecture

The agent follows a sophisticated graph-based workflow:

START → Intent Classifier → [Google Search | Web Unlocker] → Final Processing → END

<div align=”center”>
<img src=”https://github.com/user-attachments/assets/1fba5659-1ba9-4970-bcda-949465c96872” alt=”2025-06-26_15h18_46″>
</div>

Routing Logic:

  • URLs in query → Direct to Web Unlocker
  • general_search → Google Search only
  • product_search → Google Search then Web Scraping
  • web_scraping → Web Unlocker only
  • comparison → Both search methods in parallel

Tech Stack

  • LangGraph
  • Gemini 2.0 Flash
  • Bright Data MCP
  • Pydantic
  • LangGraph Studio

Getting Started

<!–
Setup instruction auto-generated by langgraph template lock. DO NOT EDIT MANUALLY.
–>
<!–
End setup instructions
–>

  1. Install dependencies along with the LangGraph CLI:
cd unified-search-agent
pip install -e . "langgraph-cli[inmem]"
  1. Set up environment variables. Create a .env file with your API keys:
cp .env.example .env

Add your API keys to the .env file:

# Required
GOOGLE_API_KEY=your_gemini_api_key_here
BRIGHT_DATA_API_TOKEN=your_bright_data_token_here

# Optional zones (defaults provided)
WEB_UNLOCKER_ZONE=unblocker
BROWSER_ZONE=scraping_browser

# Optional - for LangSmith tracing
LANGSMITH_API_KEY=lsv2...
  1. Start the LangGraph Server:
langgraph dev
  1. Open LangGraph Studio at the URL provided (typically https://smith.langchain.com/studio/?baseUrl=http://127.0.0.1:2024)

For more information on getting started with LangGraph Server, see here.

📝 Usage Examples

Basic Search

{
  "query": "Who is Or Lenchner",
  "max_results": 3
}

Product Search

{
  "query": "best laptops under $1000",
  "max_results": 5
}

Web Scraping

{
  "query": "extract contact info from https://example.com",
  "max_results": 10
}

Comparison Query

{
  "query": "iPhone 15 vs Samsung Galaxy S24 comparison",
  "max_results": 5
}

🎛️ Configuration

The agent supports several configurable parameters:

  • max_results: Number of final results to return (default: 5)
  • Query-specific routing: URLs in queries automatically trigger web scraping
  • Search strategies: Automatically determined by intent classification

How to Customize

  1. Modify Intent Classification: Update the categories and examples in intent_classifier_node() in src/agent/nodes.py

  2. Adjust Search Strategies: Modify the routing logic in src/agent/graph.py to change how different intents are handled

  3. Customize Result Scoring: Update the scoring criteria in final_processing_node() to change how results are ranked

  4. Add New Search Sources: Extend the graph with additional search nodes for other data sources

  5. Configure Parameters: Modify the Configuration class in graph.py to expose additional runtime parameters

🛠️ Development

While iterating on your graph in LangGraph Studio, you can:

  • Edit past state and rerun from previous states to debug specific nodes
  • Hot reload – local changes are automatically applied
  • Create new threads using the + button to clear previous history
  • Visual debugging – see the exact flow and state at each step

The graph structure allows for easy debugging of:

  • Intent classification accuracy
  • Search result quality
  • Routing decisions
  • Final result scoring

📊 Result Format

The agent returns structured results with comprehensive scoring:

{
  "final_results": [
    {
      "title": "Result Title",
      "url": "https://example.com",
      "snippet": "Relevant description...",
      "source": "google_search",
      "relevance_score": 0.95,
      "quality_score": 0.88,
      "final_score": 0.92,
      "metadata": {
        "search_engine": "google",
        "via": "bright_data_mcp",
        "query": "original query"
      }
    }
  ],
  "query_summary": "Found information about...",
  "total_processed": 8,
  "intent": "general_search",
  "intent_confidence": 0.95
}

🔧 Advanced Features

  • Parallel Processing: Comparison queries execute both search methods simultaneously
  • Intelligent Fallbacks: Graceful error handling with default responses
  • Duplicate Detection: Automatic deduplication of results across sources
  • URL Validation: Filters out invalid or empty URLs
  • Content Sanitization: Cleans and validates all text content

For more advanced features and examples, refer to the LangGraph documentation.

LangGraph Studio integrates with LangSmith for in-depth tracing and team collaboration, allowing you to analyze and optimize your search agent’s performance.

📋 Dependencies

  • langgraph>=0.2.6: Core orchestration framework
  • langchain-google-genai: Gemini integration for LLM operations
  • pydantic>=2.0.0: Data validation and parsing
  • mcp-use: MCP client for Bright Data integration
  • langchain-core: Core LangChain utilities
  • python-dotenv>=1.0.1: Environment variable management

🤝 Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Test with LangGraph Studio
  5. Submit a pull request

📄 License

This project is licensed under the MIT License – see the LICENSE file for details.

<!–
Configuration auto-generated by langgraph template lock. DO NOT EDIT MANUALLY.
{
“config_schemas”: {
“agent”: {
“type”: “object”,
“properties”: {
“max_results”: {
“type”: “integer”,
“description”: “Maximum number of final results to return”,
“default”: 5
}
}
}
}
}
–>