Mastering Web Scraping with Proxies in Python

SwiftProxy
By - Emily Chan
2025-02-26 14:55:27

Mastering Web Scraping with Proxies in Python

When scraping data, proxies serve as a crucial tool. They help bypass rate limits, mask identities, and prevent blocks. For those using Python, the Requests module simplifies working with proxies.
This guide outlines how to use proxies with Requests in Python, covering everything from basic setups to advanced proxy rotation. Whether new to scraping or looking to improve, this post provides valuable tips and highlights common mistakes to avoid.

Getting Started with Python's Requests

First things first—make sure you've installed the Requests library. If you haven't, you can quickly get started with:

python3 -m pip install requests  

Now, let’s perform a simple HTTP request using a proxy. Here’s the basic syntax:

import requests

http_proxy = "http://130.61.171.71:3128"
proxies = {
    "http": http_proxy,
    "https": http_proxy,
}

resp = requests.get("https://ifconfig.me/ip", proxies=proxies)
print(resp, resp.text)

What's happening here?
We're simply defining a proxy, passing it to the proxies dictionary, and making a request. In this case, the output should be an IP address from the proxy:

$ python3 main.py  
<Response [200]>130.61.171.71  

Why Do We Use HTTP for Both HTTP and HTTPS

When encountering this structure, it can be confusing. Why specify HTTP for both the protocol and the value? The proxies dictionary maps each protocol (like HTTP or HTTPS) to its proxy. Although it may seem redundant, this structure is clear and efficient, especially as the scale increases.

HTTP, HTTPS, and SOCKS5 Proxy Types

There are several types of proxy connections:

HTTP Proxy – Fast but insecure. Use it for handling bulk requests.

HTTPS Proxy – Adds encryption, but a bit slower due to SSL/TLS overhead.

SOCKS5 Proxy – The most flexible. Use it if you need to connect to services beyond HTTP/HTTPS (like FTP or even the Tor network).
For SOCKS5 proxies, you'll need the requests[socks] extension. Install it with:

python3 -m pip install requests[socks]  

Then, you can use SOCKS5 like this:

import requests

username = "yourusername"
password = "yourpassword"
socks5_proxy = f"socks5://{username}:{password}@proxyserver.com:1080"

proxies = {
    "http": socks5_proxy,
    "https": socks5_proxy,
}

resp = requests.get("https://ifconfig.me", proxies=proxies)
print(resp, resp.text)

Authenticating Proxies

Most paid proxy services require authentication. Here's a simple way to authenticate using basic credentials:

username = "yourusername"
password = "yourpassword"
proxies = {
    "http": f"http://{username}:{password}@proxyserver.com:1080",
    "https": f"https://{username}:{password}@proxyserver.com:443",
}

Use Proxies with Environment Variables

Sometimes, it's more convenient to use environment variables to set your proxies. Here's how:

$ export HTTP_PROXY='http://yourusername:[email protected]:1080'  
$ export HTTPS_PROXY='https://yourusername:[email protected]:443'  
$ export NO_PROXY='localhost,127.0.0.1'  
$ python3  
>>> import requests  
>>> resp = requests.get("https://ifconfig.me/ip")  
>>> print(resp.text)  
186.188.228.86  

These environment variables allow you to set default proxies without hardcoding them in your script.

Make Requests with Sessions

Want to set default configurations like headers, timeouts, or proxies? Use a Session object:

import requests

proxies = {
    "http": "http://username:[email protected]:1080",
    "https": "https://username:[email protected]:443",
}

session = requests.Session()
session.proxies.update(proxies)

resp = session.get("https://ifconfig.me")
print(resp.text)

Sessions are helpful when you're scraping a website that requires cookies or headers. Plus, they avoid the need to reconfigure proxies with every request.

Bypass Anti-Bot Protections with Proxy Rotation

Let's say you're scraping a heavily-protected site. Rotating proxies is the solution. Here's how you can rotate through a list of proxies manually:

import random  
import requests  

proxies_list = [  
    "http://proxy1.com:8080",  
    "http://proxy2.com:8080",  
    "http://proxy3.com:8080",  
]  

for _ in range(10):  
    proxy = random.choice(proxies_list)  
    proxies = {"https": proxy}  
    resp = requests.get("https://ifconfig.me/ip", proxies=proxies)  
    print(resp.text)  

But if you want something more seamless, a professional provider like swiftproxy allows automatic IP rotation per request. Here's a quick example:

import requests  

proxies = {  
    "http": "http://username:[email protected]:1080",  
    "https": "http://username:[email protected]:1080",  
}  

for _ in range(10):  
    resp = requests.get("https://ifconfig.me/ip", proxies=proxies)  
    print(resp.text)  

Each request will come from a different IP.

Comparing Sticky Proxies and Rotating Proxies

Sticky proxies are useful when you need to keep the same IP across multiple requests. For example, if you're scraping a login-required page and want to avoid IP mismatches.

Rotating proxies give you a new IP for every request, which is ideal for bypassing rate limits and anti-bot protections.

Here's an example using sticky proxies:

import requests  
from uuid import uuid4  

def sticky_proxies_demo():  
    sessions = [uuid4().hex[:6] for _ in range(2)]  

    for i in range(10):  
        session = sessions[i % len(sessions)]  
        http_proxy = f"http://username,session_{session}:[email protected]:1080"  
        proxies = {"http": http_proxy, "https": http_proxy}  
        resp = requests.get("https://ifconfig.me/ip", proxies=proxies)  
        print(f"Session {session}: {resp.text}")  

sticky_proxies_demo()  

Handling Common Proxy Errors

Here are some tips for handling network-related errors, like ProxyError or TimeoutError:

Retries: Implement a retry mechanism. Use requests' built-in retry functionality via a Session object.

SSLError: If you're running into SSL issues, you can disable verification with verify=False. Just be aware that you'll see security warnings:

import requests  
import urllib3  

urllib3.disable_warnings()  

resp = requests.get("https://ifconfig.me/ip", proxies=proxies, verify=False)  
print(resp.text)  

Wrapping Up

By following these guidelines, you can save time, avoid costly mistakes, and build more efficient and reliable scraping scripts. Proxies may be tricky at first, but with these actionable insights, you will become proficient quickly.

關於作者

SwiftProxy
Emily Chan
Swiftproxy首席撰稿人
Emily Chan是Swiftproxy的首席撰稿人,擁有十多年技術、數字基礎設施和戰略傳播的經驗。她常駐香港,結合區域洞察力和清晰實用的表達,幫助企業駕馭不斷變化的代理IP解決方案和數據驅動增長。
Swiftproxy部落格提供的內容僅供參考,不提供任何形式的保證。Swiftproxy不保證所含資訊的準確性、完整性或合法合規性,也不對部落格中引用的第三方網站內容承擔任何責任。讀者在進行任何網頁抓取或自動化資料蒐集活動之前,強烈建議諮詢合格的法律顧問,並仔細閱讀目標網站的服務條款。在某些情況下,可能需要明確授權或抓取許可。
Join SwiftProxy Discord community Chat with SwiftProxy support via WhatsApp Chat with SwiftProxy support via Telegram
Chat with SwiftProxy support via Email