How to Scrape Paginated Websites Efficiently

Picture that you’ve set up a scraper to collect critical data, only to realize you’ve captured half the dataset. One missing page, one overlooked button—and suddenly your analysis is flawed. Pagination can feel like a maze, but once you understand its patterns, it becomes a powerful ally rather than a stumbling block. Let’s dive in.

SwiftProxy
By - Martin Koenig
2026-01-06 16:15:46

How to Scrape Paginated Websites Efficiently

Understanding the Pagination Puzzle

Websites paginate content for speed, performance, or simply to make massive datasets manageable. For scrapers, this adds complexity. Instead of grabbing all data at once, your scraper must navigate multiple pages—or dynamic feeds—to ensure completeness. Miss a step, and your dataset is incomplete.

Types of Pagination

1. Page-based Pagination

URLs change predictably: https://example.com/products?page=5. Simple, structured, and easy to loop through. The catch? Page counts can change. Your scraper must be flexible enough to handle extra pages—or missing ones—without duplicating data.

2. Offset-based Pagination

Here, content is sliced by offset and limit: ?offset=50&limit=25. Perfect for APIs and database-driven sites. But beware: large offsets can slow requests or trigger anti-bot defenses.

3. Cursor-based (Token-based) Pagination

Modern APIs often return a cursor instead of a page number: ?cursor=eyJpZCI6IjEyMyJ9. The scraper must update the cursor after each request. Tokens can expire fast, so timing and handling are crucial.

4. Infinite Scroll / "Load More"

Content loads dynamically as users scroll or click "Load More." Scraping requires either headless browsers or inspecting the AJAX requests behind the scenes. This is where patience and precision meet coding skill.

Difficulties in Scraping Paginated Content

Pagination isn't just a hurdle—it's a test of strategy. Here's what trips scrapers up most:

  • Massive datasets: Hundreds of pages, thousands of entries. Your scraper must handle scale gracefully.
  • JavaScript-rendered content: Infinite scroll relies on JS/AJAX, invisible to simple HTML parsers.
  • Rate limits and blocks: Aggressive requests can trigger CAPTCHAs or IP bans. Proxies are essential to rotate IPs and mimic human behavior.
  • Duplicate/missing data: Dynamic pagination can shift. Deduplication and error handling are non-negotiable.
  • Changing structures: Sites update layouts constantly. Yesterday's working scraper might fail tomorrow.

Approaches to Scraping Paginated Data

There's no universal approach. The method depends entirely on the pagination type. Here's how to tackle each:

1. Static HTML (Page-based)

import requests
from bs4 import BeautifulSoup

base_url = "https://example.com/products?page={}"

for page in range(1, 6):
    url = base_url.format(page)
    response = requests.get(url)
    soup = BeautifulSoup(response.text, "html.parser")
    
    for item in soup.select(".product-title"):
        print(item.get_text())

Loop through pages, parse HTML, and extract content. Simple. Effective. But only works when pagination is numeric and consistent.

2. Offset-based

import requests

base_url = "https://example.com/products?offset={}andlimit=25"

for offset in range(0, 100, 25):
    url = base_url.format(offset)
    response = requests.get(url)
    data = response.json()
    
    for product in data["products"]:
        print(product["name"])

Offset increments slice the dataset neatly. Ideal for APIs and structured content.

3. Cursor-based

import requests

url = "https://example.com/products"
params = {"limit": 25}
has_more = True

while has_more:
    response = requests.get(url, params=params).json()
    for product in response["data"]:
        print(product["name"])
    
    if "next_cursor" in response:
        params["cursor"] = response["next_cursor"]
    else:
        has_more = False

Update the cursor after each request until the data runs out. Timing and token management are key.

4. Infinite Scroll / "Load More"

import requests

url = "https://example.com/v1/scraper"
params = {
    "url": "https://example.com/products?page=1",
    "render_js": True,
    "pagination": "auto"
}

response = requests.get(url, params=params, auth=("API_KEY", ""))
print(response.json())

Trigger AJAX calls dynamically, extract rendered content, repeat. Patience pays off.

Tips for Paginated Scraping

Scraping paginated content isn't just about getting the data—it's about doing it smartly:

  • Follow website limits: Avoid aggressive requests that trigger defenses.
  • Change IPs and user agents regularly: Distribute requests to reduce detection risk.
  • Deduplicate results: Ensure clean, reliable datasets.
  • Track structural changes: Catch site updates early to prevent failures.
  • Cache and reuse data: Save bandwidth and reduce server load.
  • Prioritize ethics: Follow robots.txt and site policies. Legal and sustainable scraping wins long-term.

Final Thoughts

With the right approach, pagination stops being a hurdle and becomes a tool. Use proxies, deduplication, and careful monitoring to capture complete datasets and turn your scraper into a reliable, powerful tool.

About the author

SwiftProxy
Martin Koenig
Head of Commerce
Martin Koenig is an accomplished commercial strategist with over a decade of experience in the technology, telecommunications, and consulting industries. As Head of Commerce, he combines cross-sector expertise with a data-driven mindset to unlock growth opportunities and deliver measurable business impact.
The content provided on the Swiftproxy Blog is intended solely for informational purposes and is presented without warranty of any kind. Swiftproxy does not guarantee the accuracy, completeness, or legal compliance of the information contained herein, nor does it assume any responsibility for content on thirdparty websites referenced in the blog. Prior to engaging in any web scraping or automated data collection activities, readers are strongly advised to consult with qualified legal counsel and to review the applicable terms of service of the target website. In certain cases, explicit authorization or a scraping permit may be required.
Join SwiftProxy Discord community Chat with SwiftProxy support via WhatsApp Chat with SwiftProxy support via Telegram
Chat with SwiftProxy support via Email