Proxies résidentiels

Proxy résidentiels statiques

Proxy résidentiels illimités

Proxys YouTube

Proxies résidentiels

Agent résidentiel statique

Proxy résidentiels illimités

Données pour l'IA

Collecte de données sur le web

SEO et scraping SERP

Suivi des prix

Agrégation des tarifs de voyage

Collecte de données sur le marché boursier

Tous les emplacements

Partenaires de Swiftproxy

Collectez des données à grande échelle

Proxies de Web Scraping Essai gratuit

Collectez des données précises dans le monde entier sans blocages ni interruptions.

Solution de proxy à bande passante illimitée pour la collecte de données vidéo à grande échelle

Boostez la croissance de votre entreprise avec Swiftproxy

Un réseau mondial de plus de 80 millions de proxies résidentiels, assurant une disponibilité de 99,89 % et des connexions stables, prenant en charge les protocoles HTTP(S) et SOCKS5.

Swiftproxy residential proxies with 80M+ IPs, 99.89% uptime, supporting HTTP(S) & SOCKS5 protocols

Programme d'affiliation

30% Commission garantie

Gains CDK

Proxies en profits

How to Crawl Sitemaps with Python

By - Martin Koenig

2025-07-04 15:03:50

Finding every URL on a website by clicking through page after page? That's yesterday's approach. When you want to grab a full list fast, sitemaps are your shortcut. These neat files map out exactly which pages a site wants indexed. Instead of slow, clunky crawling, sitemaps give you a direct route to all the URLs you need.

However, parsing sitemaps manually isn't always smooth sailing. Many sites use index sitemaps — big files pointing to smaller sitemaps, nested deep. That's extra work, and some contain thousands of URLs. Without the right tools, it quickly becomes a slog.

Enter ultimate-sitemap-parser (usp) — a Python library built to take that headache away. It fetches sitemaps, handles complex nested structures, and pulls out every URL with just a simple call. No fuss. No heavy lifting.

Today, we'll walk you through using usp to crawl the ASOS sitemap. By the end, you'll know exactly how to extract every URL quickly and efficiently.

What You Need Before You Start

1. Python installed

Not installed yet? Grab the latest version from python.org. Check your install by running this command in your terminal:

python3 --version

2. ultimate-sitemap-parser library

Install it with pip:

pip install ultimate-sitemap-parser

Grabbing URLs from the ASOS Homepage Sitemap

Let's jump in. Here's how to grab all URLs from the ASOS homepage sitemap in a snap:

from usp.tree import sitemap_tree_for_homepage

url = "https://www.asos.com/"
tree = sitemap_tree_for_homepage(url)

for page in tree.all_pages():
    print(page.url)

That's it. The library does the heavy lifting, fetching the sitemap, parsing XML, and listing every URL.

Handling Nested Sitemaps Automatically

Many sites don't keep all URLs in one place. They break them down into index sitemaps — think product pages separate from category pages or blog posts. Without the right tool, you’d have to write extra code to dig through each one.

But usp? It just works. It finds those nested sitemaps, fetches them all recursively, and extracts every single URL — no extra work from you.

Filtering URLs by Type

Want only product pages? Easy. If product URLs contain /product/, just filter them:

product_urls = [page.url for page in tree.all_pages() if "/product/" in page.url]

for url in product_urls:
    print(url)

Instantly narrow your crawl to what matters.

Saving URLs for Later Use

Printing URLs to your screen is great for quick checks, but storing them for analysis? Even better.
Here's how to save those URLs into a CSV file:

import csv
from usp.tree import sitemap_tree_for_homepage

url = "https://www.asos.com/"
tree = sitemap_tree_for_homepage(url)

urls = [page.url for page in tree.all_pages()]

csv_filename = "asos_sitemap_urls.csv"
with open(csv_filename, "w", newline="", encoding="utf-8") as file:
    writer = csv.writer(file)
    writer.writerow(["URL"])
    for url in urls:
        writer.writerow([url])

print(f"Extracted {len(urls)} URLs and saved to {csv_filename}")

Now you have a neat CSV ready for your next steps.

Wrapping Up

Parsing sitemaps doesn't have to be complicated. With ultimate-sitemap-parser, the entire process — from fetching nested sitemaps to filtering and saving URLs — is streamlined and straightforward. No more XML headaches or manual digging.

Whether you're building a scraper, conducting SEO analysis, or auditing a website, usp is a powerhouse tool to add to your Python arsenal.

Note sur l'auteur

Martin Koenig

Responsable Commercial

Martin Koenig est un stratège commercial accompli avec plus de dix ans d'expérience dans les industries de la technologie, des télécommunications et du conseil. En tant que Responsable Commercial, il combine une expertise multisectorielle avec une approche axée sur les données pour identifier des opportunités de croissance et générer un impact commercial mesurable.

Le contenu fourni sur le blog Swiftproxy est destiné uniquement à des fins d'information et est présenté sans aucune garantie. Swiftproxy ne garantit pas l'exactitude, l'exhaustivité ou la conformité légale des informations contenues, ni n'assume de responsabilité pour le contenu des sites tiers référencés dans le blog. Avant d'engager toute activité de scraping web ou de collecte automatisée de données, il est fortement conseillé aux lecteurs de consulter un conseiller juridique qualifié et de revoir les conditions d'utilisation applicables du site cible. Dans certains cas, une autorisation explicite ou un permis de scraping peut être requis.

Dans cet article

Solutions proxy résidentielles de haut niveau

Accédez à plus de 90 millions d'IP résidentiels avec une fiabilité élevée et des temps de réponse rapides.

Essai gratuit

FAQ

Charger plus

Afficher moins

Chat with SwiftProxy support via Telegram

Contactez-nous avec un email

[email protected]

Tips

Veuillez fournir votre numéro de compte ou votre adresse courriel.
Fournissez des vidéos ou des captures d'écran et décrivez simplement les problèmes auxquels vous êtes confronté.
Notre personnel répondra à votre message dans les 24 heures.

How to Crawl Sitemaps with Python

What You Need Before You Start

1. Python installed

2. ultimate-sitemap-parser library

Grabbing URLs from the ASOS Homepage Sitemap

Handling Nested Sitemaps Automatically

Filtering URLs by Type

Saving URLs for Later Use

Wrapping Up

Note sur l'auteur

Articles liés