How to Scrape Instagram Data Efficiently with Python

SwiftProxy
By - Emily Chan
2025-02-28 15:29:45

How to Scrape Instagram Data Efficiently with Python

Instagram's data is a treasure trove for researchers, marketers, and developers—but getting it? That's a different story. With sophisticated anti-bot systems, login hurdles, and rate limits, scraping Instagram can feel like trying to break into a vault. But don't worry, it's not impossible.
In this guide, we'll walk you through scraping Instagram user data with Python. By sending API requests, parsing JSON responses, and using a few clever tools, you'll be able to collect valuable insights from public profiles. Let's dive in.

The Tools You Need

Before we get into the code, let's set you up with the right tools. To scrape Instagram efficiently, you'll need a couple of Python libraries. Make sure you have these installed:
pip install requests python-box

· requests: Used for making HTTP requests to Instagram's backend.

· python-box: This simplifies navigating and accessing JSON data.

Step 1: Sending the API Request

Instagram's frontend is locked down tight. But here's the trick: Instagram has an exposed backend API that's not as heavily protected. With the right headers, you can pull public data without authentication.
We'll target the user profile endpoint to grab key data like follower count, bio, and post details. Here's how we make the request:

import requests

headers = {
    "x-ig-app-id": "936619743392459",  # Essential to mimic the Instagram app
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.94 Safari/537.36",
    "Accept-Language": "en-US,en;q=0.9",
    "Accept-Encoding": "gzip, deflate, br",
    "Accept": "*/*",
}

username = 'testtest'  # Replace with the username you want

response = requests.get(f'https://i.instagram.com/api/v1/users/web_profile_info/?username={username}', headers=headers)
response_json = response.json()  # Parse the response to JSON

Explanation:

· Headers: These mimic a real browser request. Instagram looks at headers to detect bots—so using valid headers like x-ig-app-id and User-Agent makes your requests appear legitimate.

· Backend API: We're hitting the web_profile_info endpoint, which pulls detailed user profile data.

Step 2: Configuring Proxies to Prevent Rate Limiting

Instagram's rate-limiting can be a challenge. If you're scraping multiple profiles or making a lot of requests, you might get blocked. Proxies can help overcome this issue.
Proxies allow you to send requests through different IPs, masking your real location and preventing Instagram from flagging you.
Here's how to integrate proxies into your requests:

proxies = {
    'http': 'http://<proxy_username>:<proxy_password>@<proxy_ip>:<proxy_port>',
    'https': 'https://<proxy_username>:<proxy_password>@<proxy_ip>:<proxy_port>',
}

response = requests.get(f'https://i.instagram.com/api/v1/users/web_profile_info/?username={username}', headers=headers, proxies=proxies)

Step 3: Streamlining JSON Parsing with Box

Instagram's API returns a deep, nested JSON structure. Navigating this with standard dictionary syntax can be a hassle. This is where Box comes in.
Box turns JSON into an object that you can access with dot notation. It's cleaner, faster, and more intuitive.
Here's how to use it:

from box import Box

response_json = Box(response.json())

user_data = {
    'full name': response_json.data.user.full_name,
    'followers': response_json.data.user.edge_followed_by.count,
    'bio': response_json.data.user.biography,
    'is_verified': response_json.data.user.is_verified,
    'profile_pic': response_json.data.user.profile_pic_url_hd,
}

Instead of accessing data with response_json['data']['user']['full_name'], you can just write response_json.data.user.full_name. Much easier, right?

Step 4: Collecting Video and Post Data

Once you have the profile data, you can dig deeper into a user's posts and videos. Instagram gives you a goldmine of insights here: view counts, like counts, and more.
For videos, here's the extraction method:

profile_video_data = []
for element in response_json.data.user.edge_felix_video_timeline.edges:
    video_data = {
        'id': element.node.id,
        'video_url': element.node.video_url,
        'views': element.node.video_view_count,
        'likes': element.node.edge_liked_by.count,
    }
    profile_video_data.append(video_data)

Similarly, you can extract regular timeline posts (photos and videos) and pull data like media URL, comment counts, and like counts.

Step 5: Saving Data for Later Use

Once you've scraped the data, you'll likely want to save it. Use Python's built-in json library to export the data into readable JSON files for later analysis.

import json

# Save the profile data
with open(f'{username}_profile_data.json', 'w') as file:
    json.dump(user_data, file, indent=4)

# Save video data
with open(f'{username}_video_data.json', 'w') as file:
    json.dump(profile_video_data, file, indent=4)

Now, you have a neatly structured JSON file with all the data you need. Easy to read, easy to process.

Complete Code for Scraping Instagram Data

Here's the full Python script for your convenience:

import requests
from box import Box
import json

headers = {
    "x-ig-app-id": "936619743392459", 
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.94 Safari/537.36",
}

proxies = {
    'http': 'http://<proxy_username>:<proxy_password>@<proxy_ip>:<proxy_port>',
    'https': 'https://<proxy_username>:<proxy_password>@<proxy_ip>:<proxy_port>',
}

username = 'testtest'

response = requests.get(f'https://i.instagram.com/api/v1/users/web_profile_info/?username={username}', headers=headers, proxies=proxies)
response_json = Box(response.json())

user_data = {
    'full name': response_json.data.user.full_name,
    'followers': response_json.data.user.edge_followed_by.count,
}

# Save the data
with open(f'{username}_profile_data.json', 'w') as file:
    json.dump(user_data, file, indent=4)

Wrapping Up

Scraping Instagram data can seem daunting, but with the right tools, it's entirely doable. By leveraging Instagram's backend API, using headers to mimic a real browser, and applying proxies to avoid detection, you can scrape Instagram data from public profiles and gain valuable insights. Always ensure you're respecting Instagram's terms of service. Stay ethical and avoid overloading their servers with requests.

Note sur l'auteur

SwiftProxy
Emily Chan
Rédactrice en chef chez Swiftproxy
Emily Chan est la rédactrice en chef chez Swiftproxy, avec plus de dix ans d'expérience dans la technologie, les infrastructures numériques et la communication stratégique. Basée à Hong Kong, elle combine une connaissance régionale approfondie avec une voix claire et pratique pour aider les entreprises à naviguer dans le monde en évolution des solutions proxy et de la croissance basée sur les données.
Le contenu fourni sur le blog Swiftproxy est destiné uniquement à des fins d'information et est présenté sans aucune garantie. Swiftproxy ne garantit pas l'exactitude, l'exhaustivité ou la conformité légale des informations contenues, ni n'assume de responsabilité pour le contenu des sites tiers référencés dans le blog. Avant d'engager toute activité de scraping web ou de collecte automatisée de données, il est fortement conseillé aux lecteurs de consulter un conseiller juridique qualifié et de revoir les conditions d'utilisation applicables du site cible. Dans certains cas, une autorisation explicite ou un permis de scraping peut être requis.
Join SwiftProxy Discord community Chat with SwiftProxy support via WhatsApp Chat with SwiftProxy support via Telegram
Chat with SwiftProxy support via Email