Proxies résidentiels

Proxy résidentiels statiques

Proxy résidentiels illimités

Proxys YouTube

Proxies résidentiels

Agent résidentiel statique

Proxy résidentiels illimités

Données pour l'IA

Collecte de données sur le web

SEO et scraping SERP

Suivi des prix

Agrégation des tarifs de voyage

Collecte de données sur le marché boursier

Tous les emplacements

Partenaires de Swiftproxy

Collectez des données à grande échelle

Proxies de Web Scraping Essai gratuit

Collectez des données précises dans le monde entier sans blocages ni interruptions.

Solution de proxy à bande passante illimitée pour la collecte de données vidéo à grande échelle

Boostez la croissance de votre entreprise avec Swiftproxy

Un réseau mondial de plus de 80 millions de proxies résidentiels, assurant une disponibilité de 99,89 % et des connexions stables, prenant en charge les protocoles HTTP(S) et SOCKS5.

Swiftproxy residential proxies with 80M+ IPs, 99.89% uptime, supporting HTTP(S) & SOCKS5 protocols

Programme d'affiliation

30% Commission garantie

Gains CDK

Proxies en profits

How to Scrape Instagram Data Efficiently with Python

By - Emily Chan

2025-02-28 15:29:45

Instagram's data is a treasure trove for researchers, marketers, and developers—but getting it? That's a different story. With sophisticated anti-bot systems, login hurdles, and rate limits, scraping Instagram can feel like trying to break into a vault. But don't worry, it's not impossible.
In this guide, we'll walk you through scraping Instagram user data with Python. By sending API requests, parsing JSON responses, and using a few clever tools, you'll be able to collect valuable insights from public profiles. Let's dive in.

The Tools You Need

Before we get into the code, let's set you up with the right tools. To scrape Instagram efficiently, you'll need a couple of Python libraries. Make sure you have these installed:
pip install requests python-box

· requests: Used for making HTTP requests to Instagram's backend.

· python-box: This simplifies navigating and accessing JSON data.

Step 1: Sending the API Request

Instagram's frontend is locked down tight. But here's the trick: Instagram has an exposed backend API that's not as heavily protected. With the right headers, you can pull public data without authentication.
We'll target the user profile endpoint to grab key data like follower count, bio, and post details. Here's how we make the request:

import requests

headers = {
    "x-ig-app-id": "936619743392459",  # Essential to mimic the Instagram app
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.94 Safari/537.36",
    "Accept-Language": "en-US,en;q=0.9",
    "Accept-Encoding": "gzip, deflate, br",
    "Accept": "*/*",
}

username = 'testtest'  # Replace with the username you want

response = requests.get(f'https://i.instagram.com/api/v1/users/web_profile_info/?username={username}', headers=headers)
response_json = response.json()  # Parse the response to JSON

Explanation:

· Headers: These mimic a real browser request. Instagram looks at headers to detect bots—so using valid headers like x-ig-app-id and User-Agent makes your requests appear legitimate.

· Backend API: We're hitting the web_profile_info endpoint, which pulls detailed user profile data.

Step 2: Configuring Proxies to Prevent Rate Limiting

Instagram's rate-limiting can be a challenge. If you're scraping multiple profiles or making a lot of requests, you might get blocked. Proxies can help overcome this issue.
Proxies allow you to send requests through different IPs, masking your real location and preventing Instagram from flagging you.
Here's how to integrate proxies into your requests:

proxies = {
    'http': 'http://<proxy_username>:<proxy_password>@<proxy_ip>:<proxy_port>',
    'https': 'https://<proxy_username>:<proxy_password>@<proxy_ip>:<proxy_port>',
}

response = requests.get(f'https://i.instagram.com/api/v1/users/web_profile_info/?username={username}', headers=headers, proxies=proxies)

Step 3: Streamlining JSON Parsing with Box

Instagram's API returns a deep, nested JSON structure. Navigating this with standard dictionary syntax can be a hassle. This is where Box comes in.
Box turns JSON into an object that you can access with dot notation. It's cleaner, faster, and more intuitive.
Here's how to use it:

from box import Box

response_json = Box(response.json())

user_data = {
    'full name': response_json.data.user.full_name,
    'followers': response_json.data.user.edge_followed_by.count,
    'bio': response_json.data.user.biography,
    'is_verified': response_json.data.user.is_verified,
    'profile_pic': response_json.data.user.profile_pic_url_hd,
}

Instead of accessing data with response_json['data']['user']['full_name'], you can just write response_json.data.user.full_name. Much easier, right?

Step 4: Collecting Video and Post Data

Once you have the profile data, you can dig deeper into a user's posts and videos. Instagram gives you a goldmine of insights here: view counts, like counts, and more.
For videos, here's the extraction method:

profile_video_data = []
for element in response_json.data.user.edge_felix_video_timeline.edges:
    video_data = {
        'id': element.node.id,
        'video_url': element.node.video_url,
        'views': element.node.video_view_count,
        'likes': element.node.edge_liked_by.count,
    }
    profile_video_data.append(video_data)

Similarly, you can extract regular timeline posts (photos and videos) and pull data like media URL, comment counts, and like counts.

Step 5: Saving Data for Later Use

Once you've scraped the data, you'll likely want to save it. Use Python's built-in json library to export the data into readable JSON files for later analysis.

import json

# Save the profile data
with open(f'{username}_profile_data.json', 'w') as file:
    json.dump(user_data, file, indent=4)

# Save video data
with open(f'{username}_video_data.json', 'w') as file:
    json.dump(profile_video_data, file, indent=4)

Now, you have a neatly structured JSON file with all the data you need. Easy to read, easy to process.

Complete Code for Scraping Instagram Data

Here's the full Python script for your convenience:

import requests
from box import Box
import json

headers = {
    "x-ig-app-id": "936619743392459", 
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.94 Safari/537.36",
}

proxies = {
    'http': 'http://<proxy_username>:<proxy_password>@<proxy_ip>:<proxy_port>',
    'https': 'https://<proxy_username>:<proxy_password>@<proxy_ip>:<proxy_port>',
}

username = 'testtest'

response = requests.get(f'https://i.instagram.com/api/v1/users/web_profile_info/?username={username}', headers=headers, proxies=proxies)
response_json = Box(response.json())

user_data = {
    'full name': response_json.data.user.full_name,
    'followers': response_json.data.user.edge_followed_by.count,
}

# Save the data
with open(f'{username}_profile_data.json', 'w') as file:
    json.dump(user_data, file, indent=4)

Wrapping Up

Scraping Instagram data can seem daunting, but with the right tools, it's entirely doable. By leveraging Instagram's backend API, using headers to mimic a real browser, and applying proxies to avoid detection, you can scrape Instagram data from public profiles and gain valuable insights. Always ensure you're respecting Instagram's terms of service. Stay ethical and avoid overloading their servers with requests.

Note sur l'auteur

Emily Chan

Rédactrice en chef chez Swiftproxy

Emily Chan est la rédactrice en chef chez Swiftproxy, avec plus de dix ans d'expérience dans la technologie, les infrastructures numériques et la communication stratégique. Basée à Hong Kong, elle combine une connaissance régionale approfondie avec une voix claire et pratique pour aider les entreprises à naviguer dans le monde en évolution des solutions proxy et de la croissance basée sur les données.

Le contenu fourni sur le blog Swiftproxy est destiné uniquement à des fins d'information et est présenté sans aucune garantie. Swiftproxy ne garantit pas l'exactitude, l'exhaustivité ou la conformité légale des informations contenues, ni n'assume de responsabilité pour le contenu des sites tiers référencés dans le blog. Avant d'engager toute activité de scraping web ou de collecte automatisée de données, il est fortement conseillé aux lecteurs de consulter un conseiller juridique qualifié et de revoir les conditions d'utilisation applicables du site cible. Dans certains cas, une autorisation explicite ou un permis de scraping peut être requis.

Dans cet article

Solutions proxy résidentielles de haut niveau

Accédez à plus de 90 millions d'IP résidentiels avec une fiabilité élevée et des temps de réponse rapides.

Essai gratuit

FAQ

Charger plus

Afficher moins

Chat with SwiftProxy support via Telegram

Contactez-nous avec un email

[email protected]

Tips

Veuillez fournir votre numéro de compte ou votre adresse courriel.
Fournissez des vidéos ou des captures d'écran et décrivez simplement les problèmes auxquels vous êtes confronté.
Notre personnel répondra à votre message dans les 24 heures.

How to Scrape Instagram Data Efficiently with Python

The Tools You Need

Step 1: Sending the API Request

Step 2: Configuring Proxies to Prevent Rate Limiting

Step 3: Streamlining JSON Parsing with Box

Step 4: Collecting Video and Post Data

Step 5: Saving Data for Later Use

Complete Code for Scraping Instagram Data

Wrapping Up

Note sur l'auteur

Articles liés