How to Scrape Instagram Data Efficiently with Python

SwiftProxy
By - Emily Chan
2025-02-28 15:29:45

How to Scrape Instagram Data Efficiently with Python

Instagram's data is a treasure trove for researchers, marketers, and developers—but getting it? That's a different story. With sophisticated anti-bot systems, login hurdles, and rate limits, scraping Instagram can feel like trying to break into a vault. But don't worry, it's not impossible.
In this guide, we'll walk you through scraping Instagram user data with Python. By sending API requests, parsing JSON responses, and using a few clever tools, you'll be able to collect valuable insights from public profiles. Let's dive in.

The Tools You Need

Before we get into the code, let's set you up with the right tools. To scrape Instagram efficiently, you'll need a couple of Python libraries. Make sure you have these installed:
pip install requests python-box

· requests: Used for making HTTP requests to Instagram's backend.

· python-box: This simplifies navigating and accessing JSON data.

Step 1: Sending the API Request

Instagram's frontend is locked down tight. But here's the trick: Instagram has an exposed backend API that's not as heavily protected. With the right headers, you can pull public data without authentication.
We'll target the user profile endpoint to grab key data like follower count, bio, and post details. Here's how we make the request:

import requests

headers = {
    "x-ig-app-id": "936619743392459",  # Essential to mimic the Instagram app
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.94 Safari/537.36",
    "Accept-Language": "en-US,en;q=0.9",
    "Accept-Encoding": "gzip, deflate, br",
    "Accept": "*/*",
}

username = 'testtest'  # Replace with the username you want

response = requests.get(f'https://i.instagram.com/api/v1/users/web_profile_info/?username={username}', headers=headers)
response_json = response.json()  # Parse the response to JSON

Explanation:

· Headers: These mimic a real browser request. Instagram looks at headers to detect bots—so using valid headers like x-ig-app-id and User-Agent makes your requests appear legitimate.

· Backend API: We're hitting the web_profile_info endpoint, which pulls detailed user profile data.

Step 2: Configuring Proxies to Prevent Rate Limiting

Instagram's rate-limiting can be a challenge. If you're scraping multiple profiles or making a lot of requests, you might get blocked. Proxies can help overcome this issue.
Proxies allow you to send requests through different IPs, masking your real location and preventing Instagram from flagging you.
Here's how to integrate proxies into your requests:

proxies = {
    'http': 'http://<proxy_username>:<proxy_password>@<proxy_ip>:<proxy_port>',
    'https': 'https://<proxy_username>:<proxy_password>@<proxy_ip>:<proxy_port>',
}

response = requests.get(f'https://i.instagram.com/api/v1/users/web_profile_info/?username={username}', headers=headers, proxies=proxies)

Step 3: Streamlining JSON Parsing with Box

Instagram's API returns a deep, nested JSON structure. Navigating this with standard dictionary syntax can be a hassle. This is where Box comes in.
Box turns JSON into an object that you can access with dot notation. It's cleaner, faster, and more intuitive.
Here's how to use it:

from box import Box

response_json = Box(response.json())

user_data = {
    'full name': response_json.data.user.full_name,
    'followers': response_json.data.user.edge_followed_by.count,
    'bio': response_json.data.user.biography,
    'is_verified': response_json.data.user.is_verified,
    'profile_pic': response_json.data.user.profile_pic_url_hd,
}

Instead of accessing data with response_json['data']['user']['full_name'], you can just write response_json.data.user.full_name. Much easier, right?

Step 4: Collecting Video and Post Data

Once you have the profile data, you can dig deeper into a user's posts and videos. Instagram gives you a goldmine of insights here: view counts, like counts, and more.
For videos, here's the extraction method:

profile_video_data = []
for element in response_json.data.user.edge_felix_video_timeline.edges:
    video_data = {
        'id': element.node.id,
        'video_url': element.node.video_url,
        'views': element.node.video_view_count,
        'likes': element.node.edge_liked_by.count,
    }
    profile_video_data.append(video_data)

Similarly, you can extract regular timeline posts (photos and videos) and pull data like media URL, comment counts, and like counts.

Step 5: Saving Data for Later Use

Once you've scraped the data, you'll likely want to save it. Use Python's built-in json library to export the data into readable JSON files for later analysis.

import json

# Save the profile data
with open(f'{username}_profile_data.json', 'w') as file:
    json.dump(user_data, file, indent=4)

# Save video data
with open(f'{username}_video_data.json', 'w') as file:
    json.dump(profile_video_data, file, indent=4)

Now, you have a neatly structured JSON file with all the data you need. Easy to read, easy to process.

Complete Code for Scraping Instagram Data

Here's the full Python script for your convenience:

import requests
from box import Box
import json

headers = {
    "x-ig-app-id": "936619743392459", 
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.94 Safari/537.36",
}

proxies = {
    'http': 'http://<proxy_username>:<proxy_password>@<proxy_ip>:<proxy_port>',
    'https': 'https://<proxy_username>:<proxy_password>@<proxy_ip>:<proxy_port>',
}

username = 'testtest'

response = requests.get(f'https://i.instagram.com/api/v1/users/web_profile_info/?username={username}', headers=headers, proxies=proxies)
response_json = Box(response.json())

user_data = {
    'full name': response_json.data.user.full_name,
    'followers': response_json.data.user.edge_followed_by.count,
}

# Save the data
with open(f'{username}_profile_data.json', 'w') as file:
    json.dump(user_data, file, indent=4)

Wrapping Up

Scraping Instagram data can seem daunting, but with the right tools, it's entirely doable. By leveraging Instagram's backend API, using headers to mimic a real browser, and applying proxies to avoid detection, you can scrape Instagram data from public profiles and gain valuable insights. Always ensure you're respecting Instagram's terms of service. Stay ethical and avoid overloading their servers with requests.

About the author

SwiftProxy
Emily Chan
Lead Writer at Swiftproxy
Emily Chan is the lead writer at Swiftproxy, bringing over a decade of experience in technology, digital infrastructure, and strategic communications. Based in Hong Kong, she combines regional insight with a clear, practical voice to help businesses navigate the evolving world of proxy solutions and data-driven growth.
The content provided on the Swiftproxy Blog is intended solely for informational purposes and is presented without warranty of any kind. Swiftproxy does not guarantee the accuracy, completeness, or legal compliance of the information contained herein, nor does it assume any responsibility for content on thirdparty websites referenced in the blog. Prior to engaging in any web scraping or automated data collection activities, readers are strongly advised to consult with qualified legal counsel and to review the applicable terms of service of the target website. In certain cases, explicit authorization or a scraping permit may be required.
Join SwiftProxy Discord community Chat with SwiftProxy support via WhatsApp Chat with SwiftProxy support via Telegram
Chat with SwiftProxy support via Email