How to Scrape a Website with Selenium

SwiftProxy
By - Martin Koenig
2025-04-22 15:58:04

How to Scrape a Website with Selenium

Web scraping can open up a treasure trove of data, but the process can be tricky, especially when you're dealing with websites that load dynamic content or require user interactions. If you've ever tried scraping a website that uses a lot of JavaScript, you know how frustrating it can be to extract the information you need. Here's where Selenium comes in.
Selenium is an open-source framework that allows you to control a web browser programmatically. Unlike traditional scraping tools, it can handle JavaScript-heavy websites and dynamic content with ease. In this guide, we'll walk you through the process of setting up Selenium with Python and using it to scrape a website from start to finish.

What is Selenium and Why Should You Care

Selenium is a versatile tool that automates web browsers. It's primarily known for testing web applications, but it's also a powerhouse when it comes to web scraping. Why? Because Selenium can interact with web pages the same way a human would. It can click buttons, submit forms, and even navigate dynamic elements—making it an essential tool for scraping websites with complex structures.
Use case examples:
E-commerce sites: Scrape product listings or reviews.
Social media: Collect posts and comments.
Financial sites: Extract live data from charts.
In short, if you need to scrape content from a website that changes frequently or relies on JavaScript to display data, Selenium is your go-to tool.

What You Need

Before you can scrape a website with Selenium, you'll need a few things. Here's what you'll need to get started:
Python – You should be comfortable with the basics of Python. If you're new to it, take some time to familiarize yourself with loops, functions, and basic data structures.
Selenium – This is the tool we'll be using to automate the browser.
Install it using the following command:

pip install selenium

A Web Browser – For this guide, we'll be using Google Chrome, but you can use any browser. Just make sure you install the appropriate driver.
Web Driver – A browser-specific driver is required for Selenium to interact with your browser. If you're using Chrome, you'll need ChromeDriver.
Additional Packages – You'll also want to install webdriver-manager for easier handling of ChromeDriver.
Install it with:

pip install webdriver-manager

Inspecting the Web Page

Before scraping, you'll need to inspect the website to figure out where the data is located. This is a critical step.

Step 1: Launch Developer Tools

In Chrome, right-click on any element and select "Inspect".
Or press Ctrl+Shift+I (Windows/Linux) or Cmd+Option+I (Mac).

Step 2: Identify the Right Elements

Look for the tags, classes, or IDs that are associated with the data you want to scrape. For example, if you're scraping quotes, you might find that each quote is in a <span> tag with the class text.

Step 3: Copy the CSS Selector or XPath

Once you've identified the element, you can right-click on it in the developer tools and choose "Copy selector" or "Copy XPath". These are the paths Selenium will use to find the element.

Creating Your First Selenium Script

Now that you're set up, it's time to scrape your first website. Here's how you can get started:
Import Selenium – You'll need the Selenium WebDriver and other necessary modules.

from selenium import webdriver
from selenium.webdriver.common.by import By

Create the WebDriver – This is your browser instance.

browser = webdriver.Chrome()

Navigate to the Website – Use the get() method to load the page you want to scrape.

browser.get("https://quotes.toscrape.com/")

Locate Elements – Let's locate the quotes using CSS selectors or XPath.

quotes = browser.find_elements(By.CSS_SELECTOR, ".quote")

Extract Data – Extract the text from each quote element.

for quote in quotes:
    text = quote.find_element(By.CSS_SELECTOR, ".text").text
    author = quote.find_element(By.CSS_SELECTOR, ".author").text
    print(f"Quote: {text}\nAuthor: {author}\n")

Don't forget to always close the browser when you're done scraping:

browser.quit()

Scraping Data from Multiple Pages

Many websites split their content into multiple pages. If you want to scrape all the data, you'll need to handle pagination.
Here's how you can navigate through multiple pages with Selenium:
Find the "Next" Button – Use Selenium to locate the "Next" button and click it.

next_button = browser.find_element(By.LINK_TEXT, "Next")
next_button.click()

Loop Through Pages – Use a while loop to repeat the scraping process across multiple pages.

while True:
    quotes = browser.find_elements(By.CSS_SELECTOR, ".quote")
    for quote in quotes:
        text = quote.find_element(By.CSS_SELECTOR, ".text").text
        author = quote.find_element(By.CSS_SELECTOR, ".author").text
        print(f"Quote: {text}\nAuthor: {author}\n")

    try:
        next_button = browser.find_element(By.LINK_TEXT, "Next")
        next_button.click()
    except Exception:
        break

Handling exceptions is crucial for avoiding errors when the "Next" button isn't found on the last page.

Storing Your Data

Once you've scraped your data, you'll want to store it. You can store it in a CSV file or a database. Here's an example using CSV:

import csv

# Save to CSV
with open('quotes.csv', 'w', newline='', encoding='utf-8') as csvfile:
   writer = csv.writer(csvfile)
   writer.writerow(['Quote', 'Author'])  # Header
   for quote, author in zip(all_quotes, all_authors):
       writer.writerow([quote, author])

For larger datasets, consider using a database like SQLite.

Wrapping Up

You've just scraped your first website using Selenium, and this is just the start. You can now take on more complex sites, handle dynamic content, and interact with pages in ways most tools can't. As you progress, explore handling cookies, login flows, and combining Selenium with tools like BeautifulSoup or Scrapy. Always scrape responsibly and respect a site's terms of service.

About the author

SwiftProxy
Martin Koenig
Head of Commerce
Martin Koenig is an accomplished commercial strategist with over a decade of experience in the technology, telecommunications, and consulting industries. As Head of Commerce, he combines cross-sector expertise with a data-driven mindset to unlock growth opportunities and deliver measurable business impact.
The content provided on the Swiftproxy Blog is intended solely for informational purposes and is presented without warranty of any kind. Swiftproxy does not guarantee the accuracy, completeness, or legal compliance of the information contained herein, nor does it assume any responsibility for content on thirdparty websites referenced in the blog. Prior to engaging in any web scraping or automated data collection activities, readers are strongly advised to consult with qualified legal counsel and to review the applicable terms of service of the target website. In certain cases, explicit authorization or a scraping permit may be required.
Join SwiftProxy Discord community Chat with SwiftProxy support via WhatsApp Chat with SwiftProxy support via Telegram
Chat with SwiftProxy support via Email