Mastering Web Scraping with Proxies in Python

SwiftProxy
By - Emily Chan
2025-02-26 14:55:27

Mastering Web Scraping with Proxies in Python

When scraping data, proxies serve as a crucial tool. They help bypass rate limits, mask identities, and prevent blocks. For those using Python, the Requests module simplifies working with proxies.
This guide outlines how to use proxies with Requests in Python, covering everything from basic setups to advanced proxy rotation. Whether new to scraping or looking to improve, this post provides valuable tips and highlights common mistakes to avoid.

Getting Started with Python's Requests

First things first—make sure you've installed the Requests library. If you haven't, you can quickly get started with:

python3 -m pip install requests  

Now, let’s perform a simple HTTP request using a proxy. Here’s the basic syntax:

import requests

http_proxy = "http://130.61.171.71:3128"
proxies = {
    "http": http_proxy,
    "https": http_proxy,
}

resp = requests.get("https://ifconfig.me/ip", proxies=proxies)
print(resp, resp.text)

What's happening here?
We're simply defining a proxy, passing it to the proxies dictionary, and making a request. In this case, the output should be an IP address from the proxy:

$ python3 main.py  
<Response [200]>130.61.171.71  

Why Do We Use HTTP for Both HTTP and HTTPS

When encountering this structure, it can be confusing. Why specify HTTP for both the protocol and the value? The proxies dictionary maps each protocol (like HTTP or HTTPS) to its proxy. Although it may seem redundant, this structure is clear and efficient, especially as the scale increases.

HTTP, HTTPS, and SOCKS5 Proxy Types

There are several types of proxy connections:

HTTP Proxy – Fast but insecure. Use it for handling bulk requests.

HTTPS Proxy – Adds encryption, but a bit slower due to SSL/TLS overhead.

SOCKS5 Proxy – The most flexible. Use it if you need to connect to services beyond HTTP/HTTPS (like FTP or even the Tor network).
For SOCKS5 proxies, you'll need the requests[socks] extension. Install it with:

python3 -m pip install requests[socks]  

Then, you can use SOCKS5 like this:

import requests

username = "yourusername"
password = "yourpassword"
socks5_proxy = f"socks5://{username}:{password}@proxyserver.com:1080"

proxies = {
    "http": socks5_proxy,
    "https": socks5_proxy,
}

resp = requests.get("https://ifconfig.me", proxies=proxies)
print(resp, resp.text)

Authenticating Proxies

Most paid proxy services require authentication. Here's a simple way to authenticate using basic credentials:

username = "yourusername"
password = "yourpassword"
proxies = {
    "http": f"http://{username}:{password}@proxyserver.com:1080",
    "https": f"https://{username}:{password}@proxyserver.com:443",
}

Use Proxies with Environment Variables

Sometimes, it's more convenient to use environment variables to set your proxies. Here's how:

$ export HTTP_PROXY='http://yourusername:[email protected]:1080'  
$ export HTTPS_PROXY='https://yourusername:[email protected]:443'  
$ export NO_PROXY='localhost,127.0.0.1'  
$ python3  
>>> import requests  
>>> resp = requests.get("https://ifconfig.me/ip")  
>>> print(resp.text)  
186.188.228.86  

These environment variables allow you to set default proxies without hardcoding them in your script.

Make Requests with Sessions

Want to set default configurations like headers, timeouts, or proxies? Use a Session object:

import requests

proxies = {
    "http": "http://username:[email protected]:1080",
    "https": "https://username:[email protected]:443",
}

session = requests.Session()
session.proxies.update(proxies)

resp = session.get("https://ifconfig.me")
print(resp.text)

Sessions are helpful when you're scraping a website that requires cookies or headers. Plus, they avoid the need to reconfigure proxies with every request.

Bypass Anti-Bot Protections with Proxy Rotation

Let's say you're scraping a heavily-protected site. Rotating proxies is the solution. Here's how you can rotate through a list of proxies manually:

import random  
import requests  

proxies_list = [  
    "http://proxy1.com:8080",  
    "http://proxy2.com:8080",  
    "http://proxy3.com:8080",  
]  

for _ in range(10):  
    proxy = random.choice(proxies_list)  
    proxies = {"https": proxy}  
    resp = requests.get("https://ifconfig.me/ip", proxies=proxies)  
    print(resp.text)  

But if you want something more seamless, a professional provider like swiftproxy allows automatic IP rotation per request. Here's a quick example:

import requests  

proxies = {  
    "http": "http://username:[email protected]:1080",  
    "https": "http://username:[email protected]:1080",  
}  

for _ in range(10):  
    resp = requests.get("https://ifconfig.me/ip", proxies=proxies)  
    print(resp.text)  

Each request will come from a different IP.

Comparing Sticky Proxies and Rotating Proxies

Sticky proxies are useful when you need to keep the same IP across multiple requests. For example, if you're scraping a login-required page and want to avoid IP mismatches.

Rotating proxies give you a new IP for every request, which is ideal for bypassing rate limits and anti-bot protections.

Here's an example using sticky proxies:

import requests  
from uuid import uuid4  

def sticky_proxies_demo():  
    sessions = [uuid4().hex[:6] for _ in range(2)]  

    for i in range(10):  
        session = sessions[i % len(sessions)]  
        http_proxy = f"http://username,session_{session}:[email protected]:1080"  
        proxies = {"http": http_proxy, "https": http_proxy}  
        resp = requests.get("https://ifconfig.me/ip", proxies=proxies)  
        print(f"Session {session}: {resp.text}")  

sticky_proxies_demo()  

Handling Common Proxy Errors

Here are some tips for handling network-related errors, like ProxyError or TimeoutError:

Retries: Implement a retry mechanism. Use requests' built-in retry functionality via a Session object.

SSLError: If you're running into SSL issues, you can disable verification with verify=False. Just be aware that you'll see security warnings:

import requests  
import urllib3  

urllib3.disable_warnings()  

resp = requests.get("https://ifconfig.me/ip", proxies=proxies, verify=False)  
print(resp.text)  

Wrapping Up

By following these guidelines, you can save time, avoid costly mistakes, and build more efficient and reliable scraping scripts. Proxies may be tricky at first, but with these actionable insights, you will become proficient quickly.

About the author

SwiftProxy
Emily Chan
Lead Writer at Swiftproxy
Emily Chan is the lead writer at Swiftproxy, bringing over a decade of experience in technology, digital infrastructure, and strategic communications. Based in Hong Kong, she combines regional insight with a clear, practical voice to help businesses navigate the evolving world of proxy solutions and data-driven growth.
The content provided on the Swiftproxy Blog is intended solely for informational purposes and is presented without warranty of any kind. Swiftproxy does not guarantee the accuracy, completeness, or legal compliance of the information contained herein, nor does it assume any responsibility for content on thirdparty websites referenced in the blog. Prior to engaging in any web scraping or automated data collection activities, readers are strongly advised to consult with qualified legal counsel and to review the applicable terms of service of the target website. In certain cases, explicit authorization or a scraping permit may be required.
Join SwiftProxy Discord community Chat with SwiftProxy support via WhatsApp Chat with SwiftProxy support via Telegram
Chat with SwiftProxy support via Email