Unified Search Agent

Pile

LangGraph
Gemini 2.0 Flash
Bright Data MCP
Pydantic
LangGraph Studio

Partager

Features

The core logic defined in src/agent/graph.py orchestrates a sophisticated search workflow that:

  • Intent Classification: Uses Gemini 2.0 Flash to classify queries into four categories:

    • general_search: News, facts, definitions, explanations
    • product_search: Shopping, prices, reviews, recommendations
    • web_scraping: Data extraction from specific websites
    • comparison: Comparing multiple items or services
  • Multi-Modal Search:

    • Google Search: Via Bright Data’s MCP search engine for general queries
    • Web Scraping: Using Bright Data’s Web Unlocker for targeted data extraction
    • Smart Routing: Automatically chooses the best search strategy based on intent
  • Result Processing:

    • Sanitizes and deduplicates results
    • Scores results on relevance and quality
    • Returns configurable top N results with confidence scores
    • Provides query summaries
  • Error Handling: Graceful fallbacks and comprehensive error management

Architecture

The agent follows a sophisticated graph-based workflow:

START Intent Classifier [Google Search | Web Unlocker] Final Processing END
2025-06-26_15h18_46

Routing Logic:

  • URLs in query Direct to Web Unlocker
  • general_search Google Search only
  • product_search Google Search then Web Scraping
  • web_scraping Web Unlocker only
  • comparison Both search methods in parallel

Tech Stack

  • LangGraph
  • Gemini 2.0 Flash
  • Bright Data MCP
  • Pydantic
  • LangGraph Studio

Getting Started


  1. Install dependencies along with the LangGraph CLI:
cd unified-search-agent
pip install -e . "langgraph-cli[inmem]"
  1. Set up environment variables. Create a .env file with your API keys:
cp .env.example .env

Add your API keys to the .env file:

# Required
GOOGLE_API_KEY=your_gemini_api_key_here
BRIGHT_DATA_API_TOKEN=your_bright_data_token_here

# Optional zones (defaults provided)
WEB_UNLOCKER_ZONE=unblocker
BROWSER_ZONE=scraping_browser

# Optional - for LangSmith tracing
LANGSMITH_API_KEY=lsv2...
  1. Start the LangGraph Server:
langgraph dev
  1. Open LangGraph Studio at the URL provided (typically https://smith.langchain.com/studio/?baseUrl=http://127.0.0.1:2024)

For more information on getting started with LangGraph Server, see here.

Usage Examples

Basic Search

{
 "query": "Who is Or Lenchner",
 "max_results": 3
}

Product Search

{
 "query": "best laptops under $1000",
 "max_results": 5
}

Web Scraping

{
 "query": "extract contact info from https://example.com",
 "max_results": 10
}

Comparison Query

{
 "query": "iPhone 15 vs Samsung Galaxy S24 comparison",
 "max_results": 5
}

Configuration

The agent supports several configurable parameters:

  • max_results: Number of final results to return (default: 5)
  • Query-specific routing: URLs in queries automatically trigger web scraping
  • Search strategies: Automatically determined by intent classification

How to Customize

  1. Modify Intent Classification: Update the categories and examples in intent_classifier_node() in src/agent/nodes.py

  2. Adjust Search Strategies: Modify the routing logic in src/agent/graph.py to change how different intents are handled

  3. Customize Result Scoring: Update the scoring criteria in final_processing_node() to change how results are ranked

  4. Add New Search Sources: Extend the graph with additional search nodes for other data sources

  5. Configure Parameters: Modify the Configuration class in graph.py to expose additional runtime parameters

Development

While iterating on your graph in LangGraph Studio, you can:

  • Edit past state and rerun from previous states to debug specific nodes
  • Hot reload – local changes are automatically applied
  • Create new threads using the + button to clear previous history
  • Visual debugging – see the exact flow and state at each step

The graph structure allows for easy debugging of:

  • Intent classification accuracy
  • Search result quality
  • Routing decisions
  • Final result scoring

Result Format

The agent returns structured results with comprehensive scoring:

{
 "final_results": [
 {
 "title": "Result Title",
 "url": "https://example.com",
 "snippet": "Relevant description...",
 "source": "google_search",
 "relevance_score": 0.95,
 "quality_score": 0.88,
 "final_score": 0.92,
 "metadata": {
 "search_engine": "google",
 "via": "bright_data_mcp",
 "query": "original query"
 }
 }
 ],
 "query_summary": "Found information about...",
 "total_processed": 8,
 "intent": "general_search",
 "intent_confidence": 0.95
}

Advanced Features

  • Parallel Processing: Comparison queries execute both search methods simultaneously
  • Intelligent Fallbacks: Graceful error handling with default responses
  • Duplicate Detection: Automatic deduplication of results across sources
  • URL Validation: Filters out invalid or empty URLs
  • Content Sanitization: Cleans and validates all text content

For more advanced features and examples, refer to the LangGraph documentation.

LangGraph Studio integrates with LangSmith for in-depth tracing and team collaboration, allowing you to analyze and optimize your search agent’s performance.

Dependencies

  • langgraph>=0.2.6: Core orchestration framework
  • langchain-google-genai: Gemini integration for LLM operations
  • pydantic>=2.0.0: Data validation and parsing
  • mcp-use: MCP client for Bright Data integration
  • langchain-core: Core LangChain utilities
  • python-dotenv>=1.0.1: Environment variable management

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Test with LangGraph Studio
  5. Submit a pull request

License

This project is licensed under the MIT License – see the LICENSE file for details.