Scraping Event of the Year

ScrapeCon 2024

The future of data collection, today

Missed ScrapeCon? Don’t worry, we’ve got you covered!

ScrapeCon Recap: Watch Now

ScrapeCon 2024 - The State of Public Web Data

Web data is used everywhere. It’s fueling AI innovations and shaping modern businesses across almost every industry. But web data’s public nature is constantly challenged. As Big Tech increasingly cornering this asset, and different regulators taking opposite approaches, are we on the brink of public data becoming a private treasure?

Or Lenchner, Bright Data CEO, kicks off the conference by diving into the state of web data collection in 2024 and beyond, shedding light on current challenges – and opportunities – for growing scraping operations. In this session, we will cover: – How does Big Tech dominance shaping web data accessibility and utilization? – In a landscape of conflicting regulatory approaches, how do these dilemmas affect the trajectory of public data? – How can scraping operations adapt and prosper amid evolving challenges?

ScrapeCon 2024 - Cloud-Native Scraping Made Simple

Explore the future of cloud-based web scraping in this exclusive product demo, unveiling the latest tooling on the Bright Data platform.

Discover how to build and maintain scrapers that are seamlessly integrated with auto-scaling infrastructure and unblocking technology. Eliminate the hassle of managing complex scraping and scaling tasks, and focus on crafting effective business solutions. A must-attend for professionals seeking efficient and streamlined scraping operations. in this session, you’ll discover: – How a hybrid model combines the advantages of on-prem and cloud-based scraping? – How scraping APIs enhance scalability, and balance reliability and cost-effectiveness? – How to ensure your scrapers are built in a future-proof way that minimizes maintenance?

ScrapeCon 2024 - Decoding Scraping Strategies:Build, Buy, or API?

Determine the best approach for your scraping operations, whether it’s building a scraper from scratch, purchasing a ready-made dataset, or utilizing scraping APIs.

Explore the optimal tools for your tech stack, assess when certain technologies might be excessive, and understand the landscape of current scraping methodologies. This session provides a clear decision framework for every scraping scenario, ensuring you make informed choices to optimize your ScrapeOps. In this session, you’ll discover: – What’s ScrapeOps, and how it can help your web data collection become more efficient, stable and risk-free? – How to select and integrate optimal tools into your tech stack, enhancing the efficiency of your scraping projects? – Why simplifying your scraping operation can be game-changer for your business?

ScrapeCon 2024 - The Future of Data for AI: Balancing Legal and Operational Challenges

Delve into the legal and operational challenges that developers face when dealing with web data collection for AI.

Learn practical frameworks that empowers dev teams to make informed decisions, striking the right balance between legal compliance and operational efficiency. Whether you’re a seasoned developer or new to web scraping, gain valuable insights to steer your AI projects with confidence. In this session, you’ll discover: – How can web data collection address and mitigate potential biases in the data? – What legal aspects to consider when training AI models using web-collected data? – How can teams ensure compliance with privacy regulations in diverse data collection? – What tools or frameworks have proven effective in maintaining operational efficiency?

ScrapeCon 2024 - From AI-Powered Insights to Training LLMs

Embark on a practical journey from dataset creation to unleashing AI-powered insights.

Join us as we guide you through handpicking a dataset tailored to your AI objectives, ensuring accuracy with rules and custom validations, and showcasing a real case study of dataset utilization. Whether you’re a beginner or experienced, this step-by-step guide will enhance your mastery of datasets for AI. In this hands-on session, we will cover: – Dataset Selection: Choose datasets aligned with your AI objectives. – Ensuring Accuracy: Apply rules, data types, and custom validations for dataset integrity. – Real-World Application: A case study on practical dataset utilization. – Integration with Snowflake: Integrate datasets with Snowflake efficiently. – Deriving Insights: Extract AI-powered insights for specific use cases. – LLM Training: Feed structured data into LLM models for optimal training.

ScrapeCon 2024 - A Blueprint for Building a Reliable Dataset

Crafting a dependable dataset is more than just collecting data; it’s about ensuring its quality, structure, and adaptability.

Discover advanced methodologies and strategies to meticulously curate datasets, incorporating AI-driven schema creation for optimal organization and efficiency. In this session, we will cover: – AI-Driven Schema Creation: Define data structure, settings, and parameters. – Sample Review: A systematic approach to reviewing data samples. – Dataset Refresh & Export: Techniques for updating datasets and various export methods. – Data Validation: Set rules to guarantee data accuracy and consistency. – Adapting to Changes: Strategies for adjusting to website structural shifts. – Reparse Techniques: Methods to reanalyze and adjust data for enhanced flexibility.

ScrapeCon 2024 - The Executive Playbook

Secure a front-row seat for an in-depth, straightforward, and valuable discussion among senior top-tier tech executives.

They’ll share their operational challenges and solutions related to large-scale data collection. Discover how leading organizations address regulatory changes, ethical dilemmas, and the impact of AI on their processes. Guided by our Chief Customer Officer, this session equips technical executives and R&D leaders with actionable insights and proven strategies to enhance their public web data collection operations. Diving into the key panel questions: – Why is web data mission-critical for your organization, and how do you utilize it to gain operational and competitive advantages? – How does your web data collection operation function, and how has it evolved over time? What are your views on in-house versus outsourcing solutions? – What is your decision-making framework concerning web data collection resources? (considering total budget, infrastructure costs, personnel, tools, data QA, etc.) – What are the primary challenges you currently face with data collection? – How do you integrate or juxtapose public data with other data sources? – Have you faced any particular challenges or obstacles during your web data collection journey? If so, how did you tackle them? – Are there any best practices or strategies you’ve found effective for ensuring the highest quality and relevance of the web data you collect?

ScrapeCon 2024 - From Clicks to Captures: Mastering Browser Interactions for Scrapers

Dive into the latest innovations around browser automation for large-scale scraping projects.

This session is a must for devs running scraping projects that require browser interactions. In this hands-on session, you’ll learn: – Infrastructure Overview: Understand the components for multi-step scraping, including server setups, browser configurations, and proxy management. – Live API Demos: Improve your Puppeteer, Playwright, and Selenium scrapers; learn to handle multiple browsers. – Practical Application: Create a Puppeteer script for e-commerce, use Node.js, and parse HTML with Cheerio. – Debugging & Cost Management: Use Chrome DevTools for debugging and learn strategies to manage operational costs.

ScrapeCon 2024 - Beyond IP Bans & CAPTCHAs

Delve into the latest challenges posed by advanced anti-bot technologies, and the latest techniques to overcome them.

Witness real-time scraper building and troubleshooting, featuring demonstrations on optimizing network performance and overcoming challenges with static IPs. Evaluate the strengths and weaknesses of diverse proxy networks. and uncover powerful tools designed to tackle the toughest website blocks. Tailored for engineers, this session seamlessly blends strategic insights with hands-on coding and live demonstrations. Getting down to the fundamentals: Types of Blocks: Understand the different block types and how they operate. Simple and Common Blocks: Delve into IP bans and rate limits, and learn how to quickly circumvent them. Advanced Blocks: Explore CAPTCHAs, anti-bot software, Cloudflare, and other challenges, along with their solutions. Choosing the Right Proxy Product: Assess the pros and cons of various proxy networks. Live Coding: Building and fixing scrapers Demo of Single Crawl vs. 1K Batch: Observe how different networks perform in varied scenarios. Using node.js, we’ll send a single request using data center and residential proxies, demonstrating the success rates of both networks. We’ll also highlight the challenges faced when using static IPs, and how even rotating IPs can encounter issues when sending 1k requests. Tools for Exotic and Tough Website Blocks: Discover tools that tackle challenging website blocks. SERP Scraping. Live Demo: Witness the transition from multiple errors to a 100% success rate. Cloudflare Test Demo.

ScrapeCon 2024 - From Initial Request to Final Analysis

Join a dynamic live panel featuring the industry’s leading developers and data professionals as they unpack the entire spectrum of web data projects, blending expert insights, practical strategies, and a sprinkle of dev humor.

Key discussion points : – Web Data Collection Essentials: Dive into the best languages, frameworks, and tools for efficient web scraping. – Website Unblocking Mastery: Learn resilient scraping techniques, understand challenges, and discover proven workarounds. – Data Analysis Deep Dive: Tips on database optimization, data preparation, and compelling data storytelling. – AI-Driven Techniques Unveiled: Integrate AI into scraping and elevate data analyses with cutting-edge AI tools.

ScrapeCon 2024 - Closing Remarks

Web data is the engine driving AI innovations and shaping modern businesses. But with Big Tech increasingly cornering this asset, and different regulators taking opposite approaches, are we on the brink of public data becoming a private treasure? Our CEO kicks off the conference by diving into the state of web data collection in 2023/2024, shedding light on current challenges and opportunities.

In this session, you’ll discover: – Will I be able to scrape data in 2024 the same way (or at all)? – How to approach data collection in 2024, as the relevant regulation evolves? – What groundbreaking technologies and products can we expect in 2024 that will redefine scraping operations? Joining Or in his session are Anthony Goldbloom, Co-founder and former CEO of Kaggle.com, the world’s largest AI & ML community, and Jo Levy, Partner at The Norton Law Firm and former Vice President & General Counsel for Asia Pacific & Japan at Intel Corporation. Together, they’ll delve into the future of LLMs and navigate the intricate legal landscape surrounding data scraping in the age of foundational AI models like ChatGPT.

Speakers

Meet the Minds Behind the Mic.

Smiling man in black shirt with blue background.
Or Lenchner

CEO, Bright Data

Woman smiling with blue, starry background.
Jo Levy

Partner at The Norton Law
firm, Norton Law firm

Man in glasses with blue background.
Ganesh Kumar

Director of Products and
Design, Rakuten

Man smiling, dark shirt, abstract blue background.
Aviv Besinsky

Director of Proxy Products,
Bright Data

Smiling woman with blonde hair, cosmic background.
Mariya Sha

Founder & Software
Developer, Python Simplified

Smiling man with dark shirt, blue abstract background.
Omri Orgad

CCO, Bright Data

A bearded man smiling against dark abstract background.
Upendra Dev Singh

Senior Vice President of
Technology, Ixigo

Person with a dark blue background, wearing a lanyard.
Anthony Goldbloom

Senior Vice President of
Technology, Ixigo

Smiling woman against a blue, cosmic background.
Lior Levhar

Datasets Experts TL,
Bright Data

Smiling woman with long blond hair against blue background.
Tiff Janzen

Founder & Developer
Advocate, TiffInTech

Man smiling with a cosmic background.
Lewis Menelaws

VP of Technology,
Coding With Lewis

Man with dark hair and beard on blue background.
Itamar Abramovich

Director of Data Products,
Bright Data

Smiling woman in white top with blue background.
Ghita

Founder & CEO, Tech Bible

Man with gray hair and beard, blue background.
Itzhak Yosef Friedman

Director of R&D, Bright Data

Bald man in glasses with a blue background.
Alex Fierberg

Founder & Youtuber,
Alex The Analyst

Smiling man with dark background, blue light rays.
Ilya Kolker

Post Sale Specialist,
Bright Data

Smiling man in front of blue abstract background.
Tim Ru

Director of Proxy Products,
Bright Data

Man in checkered shirt, futuristic background.
Michael Beygelman

Founder, Claro Analytics

Smiling man with short hair and black shirt.
Nir Borenshtein

COO, Bright Data

Smiling man with glasses and patterned shirt.
Ken Jee

Ken's Nearest Neighbours

Thank You for Being a Part of Our Event!

Enjoy this video capturing the highlights of our event.

ScrapeCon may be over, but the conversation lives on.