Proxies résidentiels

Proxy résidentiels statiques

Proxy résidentiels illimités

Proxys YouTube

Proxies résidentiels

Agent résidentiel statique

Proxy résidentiels illimités

Données pour l'IA

Collecte de données sur le web

SEO et scraping SERP

Suivi des prix

Agrégation des tarifs de voyage

Collecte de données sur le marché boursier

Tous les emplacements

Partenaires de Swiftproxy

Collectez des données à grande échelle

Proxies de Web Scraping Essai gratuit

Collectez des données précises dans le monde entier sans blocages ni interruptions.

Solution de proxy à bande passante illimitée pour la collecte de données vidéo à grande échelle

Boostez la croissance de votre entreprise avec Swiftproxy

Un réseau mondial de plus de 80 millions de proxies résidentiels, assurant une disponibilité de 99,89 % et des connexions stables, prenant en charge les protocoles HTTP(S) et SOCKS5.

Swiftproxy residential proxies with 80M+ IPs, 99.89% uptime, supporting HTTP(S) & SOCKS5 protocols

Programme d'affiliation

30% Commission garantie

Gains CDK

Proxies en profits

Scrape Booking.com Data Efficiently Using Python

By - Martin Koenig

2025-03-19 15:59:14

Imagine extracting hundreds of hotel details—prices, ratings, descriptions, and more—effortlessly with just a few lines of Python code. Whether you're a developer, data analyst, or a business looking to gather insights, scraping Booking.com can unlock a treasure trove of valuable information.
In this article, we'll walk you through how to scrape Booking.com data, including names, locations, ratings, prices, and more. We'll be using Python's powerful libraries to extract JSON data embedded in hotel pages and save it in a structured CSV file for analysis.

Setting Up Your Python Environment

Before diving into scraping, you need to install a few essential Python libraries. It's a straightforward process.

· Requests: Used to send HTTP requests to Booking.com and fetch HTML data.

· LXML: Allows us to parse HTML content and extract data using XPath.

· JSON: A built-in Python module to handle structured JSON data.

· CSV: Built-in module for saving the data into a CSV file.
Here's how to install the required libraries:
pip install requests lxml
Now, you're ready to scrape.

Understanding the Structure of Booking.com

To effectively scrape data, it's crucial to understand the page structure and how data is stored. Booking.com dynamically embeds structured data within a JSON-LD format on each hotel page. This JSON data contains all the details we need: hotel names, pricing, locations, and more.
So, we'll be targeting that data format for our extraction process.

Step-by-Step Guide to the Scraping Process

Configuring Headers and Proxies

Booking.com is no stranger to anti-scraping measures. To keep things smooth and avoid getting blocked, we must mimic a legitimate user session. This is where custom headers come into play. Plus, proxies help prevent detection by distributing requests across multiple IP addresses.
Here's the code for sending an HTTP request with headers:

import requests  
from lxml.html import fromstring  

urls = ["https://www.booking.com/hotel/xyz"]  

for url in urls:  
    headers = {  
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36',  
    }  
    response = requests.get(url, headers=headers)

The Power of Proxies

When scraping Booking.com, you must deal with rate limits and IP tracking. To handle this, using proxies is a game-changer. You can either go for free proxies or opt for paid services offering IP address authentication. Below is how you can use proxies to send requests:

proxies = {  
    'http': 'http://your_proxy',  
    'https': 'https://your_proxy',  
}  
response = requests.get(url, headers=headers, proxies=proxies)

Extracting JSON-LD Data

Once the request is sent, it's time to parse the HTML and locate the embedded JSON-LD script that holds the valuable hotel data. We’ll use XPath to extract it.

parser = fromstring(response.text)  
json_data = json.loads(parser.xpath('//script[@type="application/ld+json"]/text()')[0])

Extracting Hotel Details

Now that we have the JSON data, it's time to extract specific details like hotel name, location, price range, and more.
Here's how to pull out some of the most critical data points:

name = json_data['name']  
location = json_data['hasMap']  
price_range = json_data['priceRange']  
rating = json_data['aggregateRating']['ratingValue']  
review_count = json_data['aggregateRating']['reviewCount']  
address = json_data['address']['streetAddress']  
url = json_data['url']

Storing the Data in a CSV

Finally, after scraping all the data, let's save it into a CSV file for easy analysis.

import csv  

fieldnames = ["Name", "Location", "Price Range", "Rating", "Review Count", "Address", "URL"]  
with open('booking_data.csv', 'w', newline='') as file:  
    writer = csv.DictWriter(file, fieldnames=fieldnames)  
    writer.writeheader()  
    writer.writerows(all_data)

Complete Script

Below is the complete code for your convenience. Copy and run it to start scraping.

import requests  
from lxml.html import fromstring  
import json  
import csv  

urls = ["https://www.booking.com/hotel/xyz"]  

all_data = []  

for url in urls:  
    headers = {  
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36',  
    }  
    response = requests.get(url, headers=headers)  

    parser = fromstring(response.text)  
    json_data = json.loads(parser.xpath('//script[@type="application/ld+json"]/text()')[0])  

    data = {  
        "Name": json_data['name'],  
        "Location": json_data['hasMap'],  
        "Price Range": json_data['priceRange'],  
        "Rating": json_data['aggregateRating']['ratingValue'],  
        "Review Count": json_data['aggregateRating']['reviewCount'],  
        "Address": json_data['address']['streetAddress'],  
        "URL": json_data['url']  
    }  

    all_data.append(data)  

# Save to CSV  
with open('booking_data.csv', 'w', newline='') as file:  
    fieldnames = ["Name", "Location", "Price Range", "Rating", "Review Count", "Address", "URL"]  
    writer = csv.DictWriter(file, fieldnames=fieldnames)  
    writer.writeheader()  
    writer.writerows(all_data)  

print("Data successfully saved to booking_data.csv")

Wrapping Up

In this guide, we explored how to scrape valuable hotel data from Booking.com using Python. We covered the installation of essential libraries, proper header configuration to avoid blocks, and techniques for extracting and saving data.
By using this method, you can gather critical insights about hotel listings, helping you make data-driven decisions whether you're analyzing market trends, creating a travel website, or simply automating data collection for business needs.

Note sur l'auteur

Martin Koenig

Responsable Commercial

Martin Koenig est un stratège commercial accompli avec plus de dix ans d'expérience dans les industries de la technologie, des télécommunications et du conseil. En tant que Responsable Commercial, il combine une expertise multisectorielle avec une approche axée sur les données pour identifier des opportunités de croissance et générer un impact commercial mesurable.

Le contenu fourni sur le blog Swiftproxy est destiné uniquement à des fins d'information et est présenté sans aucune garantie. Swiftproxy ne garantit pas l'exactitude, l'exhaustivité ou la conformité légale des informations contenues, ni n'assume de responsabilité pour le contenu des sites tiers référencés dans le blog. Avant d'engager toute activité de scraping web ou de collecte automatisée de données, il est fortement conseillé aux lecteurs de consulter un conseiller juridique qualifié et de revoir les conditions d'utilisation applicables du site cible. Dans certains cas, une autorisation explicite ou un permis de scraping peut être requis.

Dans cet article

Solutions proxy résidentielles de haut niveau

Accédez à plus de 90 millions d'IP résidentiels avec une fiabilité élevée et des temps de réponse rapides.

Essai gratuit

FAQ

Charger plus

Afficher moins

Chat with SwiftProxy support via Telegram

Contactez-nous avec un email

[email protected]

Tips

Veuillez fournir votre numéro de compte ou votre adresse courriel.
Fournissez des vidéos ou des captures d'écran et décrivez simplement les problèmes auxquels vous êtes confronté.
Notre personnel répondra à votre message dans les 24 heures.

Scrape Booking.com Data Efficiently Using Python

Setting Up Your Python Environment

Understanding the Structure of Booking.com

Step-by-Step Guide to the Scraping Process

Configuring Headers and Proxies

The Power of Proxies

Extracting JSON-LD Data

Extracting Hotel Details

Storing the Data in a CSV

Complete Script

Wrapping Up

Note sur l'auteur

Articles liés