Residential Proxies

Static Residential Proxies

Unlimited Residential Proxies

Learn more

Youtube Proxies

Residential Proxies

Static Residential Proxies

Unlimited Residential Proxies

Data for AI

Web Scraping

SEO and SERP Scraping

Price Monitoring

Travel Fare Aggregation

Stock Market Data Collection

Swiftproxy’s partners

Gather data at scale

Web Scraping Proxies Free Trial

Gather accurate data worldwide without blocks or interruptions.

Learn more >

Unlimited-Bandwidth Proxy Solution for Large-Scale Video Data Collection

Power Your Business Growth with Swiftproxy

A global network of over 80 million residential proxies, ensuring 99.89% uptime and stable connections, supporting HTTP(S) & SOCKS5 protocols.

Swiftproxy residential proxies with 80M+ IPs, 99.89% uptime, supporting HTTP(S) & SOCKS5 protocols

Affiliate program

30% commission guaranteed

CDK Earning Program

Turn your proxies into profit

How to Crawl Sitemaps with Python

By - Martin Koenig

2025-07-04 15:03:50

Finding every URL on a website by clicking through page after page? That's yesterday's approach. When you want to grab a full list fast, sitemaps are your shortcut. These neat files map out exactly which pages a site wants indexed. Instead of slow, clunky crawling, sitemaps give you a direct route to all the URLs you need.

However, parsing sitemaps manually isn't always smooth sailing. Many sites use index sitemaps — big files pointing to smaller sitemaps, nested deep. That's extra work, and some contain thousands of URLs. Without the right tools, it quickly becomes a slog.

Enter ultimate-sitemap-parser (usp) — a Python library built to take that headache away. It fetches sitemaps, handles complex nested structures, and pulls out every URL with just a simple call. No fuss. No heavy lifting.

Today, we'll walk you through using usp to crawl the ASOS sitemap. By the end, you'll know exactly how to extract every URL quickly and efficiently.

What You Need Before You Start

1. Python installed

Not installed yet? Grab the latest version from python.org. Check your install by running this command in your terminal:

python3 --version

2. ultimate-sitemap-parser library

Install it with pip:

pip install ultimate-sitemap-parser

Grabbing URLs from the ASOS Homepage Sitemap

Let's jump in. Here's how to grab all URLs from the ASOS homepage sitemap in a snap:

from usp.tree import sitemap_tree_for_homepage

url = "https://www.asos.com/"
tree = sitemap_tree_for_homepage(url)

for page in tree.all_pages():
    print(page.url)

That's it. The library does the heavy lifting, fetching the sitemap, parsing XML, and listing every URL.

Handling Nested Sitemaps Automatically

Many sites don't keep all URLs in one place. They break them down into index sitemaps — think product pages separate from category pages or blog posts. Without the right tool, you’d have to write extra code to dig through each one.

But usp? It just works. It finds those nested sitemaps, fetches them all recursively, and extracts every single URL — no extra work from you.

Filtering URLs by Type

Want only product pages? Easy. If product URLs contain /product/, just filter them:

product_urls = [page.url for page in tree.all_pages() if "/product/" in page.url]

for url in product_urls:
    print(url)

Instantly narrow your crawl to what matters.

Saving URLs for Later Use

Printing URLs to your screen is great for quick checks, but storing them for analysis? Even better.
Here's how to save those URLs into a CSV file:

import csv
from usp.tree import sitemap_tree_for_homepage

url = "https://www.asos.com/"
tree = sitemap_tree_for_homepage(url)

urls = [page.url for page in tree.all_pages()]

csv_filename = "asos_sitemap_urls.csv"
with open(csv_filename, "w", newline="", encoding="utf-8") as file:
    writer = csv.writer(file)
    writer.writerow(["URL"])
    for url in urls:
        writer.writerow([url])

print(f"Extracted {len(urls)} URLs and saved to {csv_filename}")

Now you have a neat CSV ready for your next steps.

Wrapping Up

Parsing sitemaps doesn't have to be complicated. With ultimate-sitemap-parser, the entire process — from fetching nested sitemaps to filtering and saving URLs — is streamlined and straightforward. No more XML headaches or manual digging.

Whether you're building a scraper, conducting SEO analysis, or auditing a website, usp is a powerhouse tool to add to your Python arsenal.

About the author

Martin Koenig

Head of Commerce

Martin Koenig is an accomplished commercial strategist with over a decade of experience in the technology, telecommunications, and consulting industries. As Head of Commerce, he combines cross-sector expertise with a data-driven mindset to unlock growth opportunities and deliver measurable business impact.

The content provided on the Swiftproxy Blog is intended solely for informational purposes and is presented without warranty of any kind. Swiftproxy does not guarantee the accuracy, completeness, or legal compliance of the information contained herein, nor does it assume any responsibility for content on thirdparty websites referenced in the blog. Prior to engaging in any web scraping or automated data collection activities, readers are strongly advised to consult with qualified legal counsel and to review the applicable terms of service of the target website. In certain cases, explicit authorization or a scraping permit may be required.

IN THIS ARTICLE

Top-tier residential proxy solutions