How to use Selenium for web scraping and use proxies for privacy

SwiftProxy
By - Linh Tran
2025-01-15 20:30:11

Selenium is a powerful tool that allows users to scrape data by simulating user operations in the browser, such as clicking, filling out forms, submitting, etc. When performing web crawlers, using a proxy can hide your real IP address and prevent the target website from being blocked due to frequent visits. This article will detail how to use Selenium for web scraping and combine it with a proxy to protect privacy.

Environment Setup

First, you need to make sure that Python and Selenium libraries are installed on your computer. You can install the Selenium library using the following command:

pip install selenium

Next, download the corresponding browser driver (such as ChromeDriver, GeckoDriver, etc.) according to your browser type (such as Chrome, Firefox, etc.) and add it to the system's PATH environment variable.

Web scraping using Selenium

Here is a simple example of scraping web data using Selenium:


from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options

# Configure ChromeDriver Path
chrome_driver_path = 'path/to/chromedriver'  # Replace with your ChromeDriver path

# Initialize ChromeDriver Service
service = Service(chrome_driver_path)

# Related configuration when opening the browser
options = Options()
options.add_argument("--start-maximized")  # Maximize window on startup

# Initializing WebDriver
driver = webdriver.Chrome(service=service, options=options)

# Open the target web page
driver.get('https://example.com')

# Locate elements and grab data
elements = driver.find_elements_by_xpath("//div[@class='example']")  # Use XPath to locate elements and modify them according to actual conditions
for element in elements:
    data = element.text
    print(data)  # Or save the data to a file, database, etc.

# Close the browser
driver.quit()

Combine with proxy to protect privacy

When crawling the web, using a proxy can hide your real IP address and prevent the target website from being blocked due to frequent visits. Here are the steps to use a proxy in Selenium:

‌1. Get the proxy server address and port‌

You can get the proxy server address and port by purchasing or using a free proxy service. (Considering factors such as security and speed, it is not recommended to use a free proxy)

‌2. Configure ChromeDriver to use a proxy‌

In Selenium, you can set the proxy by modifying the ChromeDriver startup parameters. The following is a sample code:

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options

# Configure ChromeDriver path
chrome_driver_path = 'path/to/chromedriver' # Replace with your ChromeDriver path

# Initialize ChromeDriver Service
service = Service(chrome_driver_path)

# Related configuration when opening the browser
options = Options()
options.add_argument("--start-maximized") # Maximize the window when starting

# Set up a proxy
proxy_server = "http://proxy_server_address:port" # Replace with your proxy server address and port
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--proxy-server=%s' % proxy_server)

# Initialize WebDriver
driver = webdriver.Chrome(service=service, options=options)

# Open the target webpage
driver.get('https://example.com')

# Locate elements and grab data
elements = driver.find_elements_by_xpath("//div[@class='example']") # Use XPath to locate elements, which can be modified according to actual conditions
for element in elements:
data = element.text
print(data) # Or save the data to a file, database, etc.

# Close the browser
driver.quit()

Notes

  • ‌Proxy availability‌: Make sure the proxy server you use is available and stable, otherwise it may affect the effect of web crawling.
  • ‌Anti-crawler mechanism‌: Many websites use anti-crawler mechanisms, such as verification codes, IP blocking, etc. When using Selenium for web crawling, you need to be careful to avoid triggering these mechanisms.
  • ‌Compliance‌: When performing web crawling, you need to comply with the terms of use and laws and regulations of the target website and must not use it for illegal purposes.

關於作者

SwiftProxy
Linh Tran
Swiftproxy高級技術分析師
Linh Tran是一位駐香港的技術作家,擁有計算機科學背景和超過八年的數字基礎設施領域經驗。在Swiftproxy,她專注於讓複雜的代理技術變得易於理解,為企業提供清晰、可操作的見解,助力他們在快速發展的亞洲及其他地區數據領域中導航。
Swiftproxy部落格提供的內容僅供參考,不提供任何形式的保證。Swiftproxy不保證所含資訊的準確性、完整性或合法合規性,也不對部落格中引用的第三方網站內容承擔任何責任。讀者在進行任何網頁抓取或自動化資料蒐集活動之前,強烈建議諮詢合格的法律顧問,並仔細閱讀目標網站的服務條款。在某些情況下,可能需要明確授權或抓取許可。
Join SwiftProxy Discord community Chat with SwiftProxy support via WhatsApp Chat with SwiftProxy support via Telegram
Chat with SwiftProxy support via Email