Picture that you’ve set up a scraper to collect critical data, only to realize you’ve captured half the dataset. One missing page, one overlooked button—and suddenly your analysis is flawed. Pagination can feel like a maze, but once you understand its patterns, it becomes a powerful ally rather than a stumbling block. Let’s dive in.

Websites paginate content for speed, performance, or simply to make massive datasets manageable. For scrapers, this adds complexity. Instead of grabbing all data at once, your scraper must navigate multiple pages—or dynamic feeds—to ensure completeness. Miss a step, and your dataset is incomplete.
URLs change predictably: https://example.com/products?page=5. Simple, structured, and easy to loop through. The catch? Page counts can change. Your scraper must be flexible enough to handle extra pages—or missing ones—without duplicating data.
Here, content is sliced by offset and limit: ?offset=50&limit=25. Perfect for APIs and database-driven sites. But beware: large offsets can slow requests or trigger anti-bot defenses.
Modern APIs often return a cursor instead of a page number: ?cursor=eyJpZCI6IjEyMyJ9. The scraper must update the cursor after each request. Tokens can expire fast, so timing and handling are crucial.
Content loads dynamically as users scroll or click "Load More." Scraping requires either headless browsers or inspecting the AJAX requests behind the scenes. This is where patience and precision meet coding skill.
Pagination isn't just a hurdle—it's a test of strategy. Here's what trips scrapers up most:
There's no universal approach. The method depends entirely on the pagination type. Here's how to tackle each:
import requests
from bs4 import BeautifulSoup
base_url = "https://example.com/products?page={}"
for page in range(1, 6):
url = base_url.format(page)
response = requests.get(url)
soup = BeautifulSoup(response.text, "html.parser")
for item in soup.select(".product-title"):
print(item.get_text())
Loop through pages, parse HTML, and extract content. Simple. Effective. But only works when pagination is numeric and consistent.
import requests
base_url = "https://example.com/products?offset={}andlimit=25"
for offset in range(0, 100, 25):
url = base_url.format(offset)
response = requests.get(url)
data = response.json()
for product in data["products"]:
print(product["name"])
Offset increments slice the dataset neatly. Ideal for APIs and structured content.
import requests
url = "https://example.com/products"
params = {"limit": 25}
has_more = True
while has_more:
response = requests.get(url, params=params).json()
for product in response["data"]:
print(product["name"])
if "next_cursor" in response:
params["cursor"] = response["next_cursor"]
else:
has_more = False
Update the cursor after each request until the data runs out. Timing and token management are key.
import requests
url = "https://example.com/v1/scraper"
params = {
"url": "https://example.com/products?page=1",
"render_js": True,
"pagination": "auto"
}
response = requests.get(url, params=params, auth=("API_KEY", ""))
print(response.json())
Trigger AJAX calls dynamically, extract rendered content, repeat. Patience pays off.
Scraping paginated content isn't just about getting the data—it's about doing it smartly:
With the right approach, pagination stops being a hurdle and becomes a tool. Use proxies, deduplication, and careful monitoring to capture complete datasets and turn your scraper into a reliable, powerful tool.