Websites can contain tens of thousands or even hundreds of thousands of pages. Crawling them manually is a nightmare. Sitemaps offer a smarter and faster solution because they act as a website's blueprint, showing exactly which pages exist.
Sitemaps can save you hours or even days of scraping work. Instead of jumping from link to link, you can collect every URL in an organized way. However, there is a complication. Many websites use index sitemaps that reference other sitemaps. Parsing these manually is both tedious and prone to errors.
Enter ultimate-sitemap-parser (usp). This Python library will take the hassle out of sitemap crawling. Let's walk through how to use usp to crawl the ASOS sitemap and extract every available URL in minutes.
Before diving in, make sure you have the basics in place:
You'll need Python installed. If you don't have it yet:
Download and install the latest version from python.org.
Verify the installation:
python3 --version
Next, grab the usp library:
pip install ultimate-sitemap-parser
With usp installed, you're ready to extract URLs from ASOS—or any site. Here's how.
Parsing XML manually is a pain. With usp, it's just a few lines:
from usp.tree import sitemap_tree_for_homepage
url = "https://www.asos.com/"
tree = sitemap_tree_for_homepage(url)
for page in tree.all_pages():
print(page.url)
Boom. That's it. All URLs, fetched and ready to use.
Many sites, such as ASOS, divide their sitemaps into different sections for products, categories, and blogs. Normally, you'd have to crawl each one individually. Not with usp.
It will:
Detect index sitemaps.
Fetch child sitemaps automatically.
Return every URL across the site.
No extra loops. No messy recursion. Just results.
Want just product pages? Easy. Filter by URL patterns:
product_urls = [page.url for page in tree.all_pages() if "/product/" in page.url]
for url in product_urls:
print(url)
Targeted extraction. Minimal effort. Maximum efficiency.
Instead of printing URLs, store them for analysis:
import csv
from usp.tree import sitemap_tree_for_homepage
url = "https://www.asos.com/"
tree = sitemap_tree_for_homepage(url)
urls = [page.url for page in tree.all_pages()]
csv_filename = "asos_sitemap_urls.csv"
with open(csv_filename, "w", newline="", encoding="utf-8") as file:
writer = csv.writer(file)
writer.writerow(["URL"])
for url in urls:
writer.writerow([url])
print(f"Extracted {len(urls)} URLs and saved to {csv_filename}")
Now you've got a complete, ready-to-analyze CSV of every page.
With ultimate-sitemap-parser, crawling sitemaps becomes effortless. It quickly extracts all URLs, automatically handles nested sitemaps, and precisely filters and saves the content you need. Whether it's for SEO audits, competitive analysis, or large-scale website scraping, USP makes a tedious task efficient and predictable.
Websites can contain tens of thousands or even hundreds of thousands of pages. Crawling them manually is a nightmare. Sitemaps offer a smarter and faster solution because they act as a website's blueprint, showing exactly which pages exist.
Sitemaps can save you hours or even days of scraping work. Instead of jumping from link to link, you can collect every URL in an organized way. However, there is a complication. Many websites use index sitemaps that reference other sitemaps. Parsing these manually is both tedious and prone to errors.
Enter ultimate-sitemap-parser (usp). This Python library will take the hassle out of sitemap crawling. Let's walk through how to use usp to crawl the ASOS sitemap and extract every available URL in minutes.
Before diving in, make sure you have the basics in place:
You'll need Python installed. If you don't have it yet:
Download and install the latest version from python.org.
Verify the installation:
python3 --version
Next, grab the usp library:
pip install ultimate-sitemap-parser
With usp installed, you're ready to extract URLs from ASOS—or any site. Here's how.
Parsing XML manually is a pain. With usp, it's just a few lines:
from usp.tree import sitemap_tree_for_homepage
url = "https://www.asos.com/"
tree = sitemap_tree_for_homepage(url)
for page in tree.all_pages():
print(page.url)
Boom. That's it. All URLs, fetched and ready to use.
Many sites, such as ASOS, divide their sitemaps into different sections for products, categories, and blogs. Normally, you'd have to crawl each one individually. Not with usp.
It will:
Detect index sitemaps.
Fetch child sitemaps automatically.
Return every URL across the site.
No extra loops. No messy recursion. Just results.
Want just product pages? Easy. Filter by URL patterns:
product_urls = [page.url for page in tree.all_pages() if "/product/" in page.url]
for url in product_urls:
print(url)
Targeted extraction. Minimal effort. Maximum efficiency.
Instead of printing URLs, store them for analysis:
import csv
from usp.tree import sitemap_tree_for_homepage
url = "https://www.asos.com/"
tree = sitemap_tree_for_homepage(url)
urls = [page.url for page in tree.all_pages()]
csv_filename = "asos_sitemap_urls.csv"
with open(csv_filename, "w", newline="", encoding="utf-8") as file:
writer = csv.writer(file)
writer.writerow(["URL"])
for url in urls:
writer.writerow([url])
print(f"Extracted {len(urls)} URLs and saved to {csv_filename}")
Now you've got a complete, ready-to-analyze CSV of every page.
With ultimate-sitemap-parser, crawling sitemaps becomes effortless. It quickly extracts all URLs, automatically handles nested sitemaps, and precisely filters and saves the content you need. Whether it's for SEO audits, competitive analysis, or large-scale website scraping, USP makes a tedious task efficient and predictable.