How to Scrape Instagram Data Efficiently with Python

SwiftProxy
By - Emily Chan
2025-02-28 15:29:45

How to Scrape Instagram Data Efficiently with Python

Instagram's data is a treasure trove for researchers, marketers, and developers—but getting it? That's a different story. With sophisticated anti-bot systems, login hurdles, and rate limits, scraping Instagram can feel like trying to break into a vault. But don't worry, it's not impossible.
In this guide, we'll walk you through scraping Instagram user data with Python. By sending API requests, parsing JSON responses, and using a few clever tools, you'll be able to collect valuable insights from public profiles. Let's dive in.

The Tools You Need

Before we get into the code, let's set you up with the right tools. To scrape Instagram efficiently, you'll need a couple of Python libraries. Make sure you have these installed:
pip install requests python-box

· requests: Used for making HTTP requests to Instagram's backend.

· python-box: This simplifies navigating and accessing JSON data.

Step 1: Sending the API Request

Instagram's frontend is locked down tight. But here's the trick: Instagram has an exposed backend API that's not as heavily protected. With the right headers, you can pull public data without authentication.
We'll target the user profile endpoint to grab key data like follower count, bio, and post details. Here's how we make the request:

import requests

headers = {
    "x-ig-app-id": "936619743392459",  # Essential to mimic the Instagram app
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.94 Safari/537.36",
    "Accept-Language": "en-US,en;q=0.9",
    "Accept-Encoding": "gzip, deflate, br",
    "Accept": "*/*",
}

username = 'testtest'  # Replace with the username you want

response = requests.get(f'https://i.instagram.com/api/v1/users/web_profile_info/?username={username}', headers=headers)
response_json = response.json()  # Parse the response to JSON

Explanation:

· Headers: These mimic a real browser request. Instagram looks at headers to detect bots—so using valid headers like x-ig-app-id and User-Agent makes your requests appear legitimate.

· Backend API: We're hitting the web_profile_info endpoint, which pulls detailed user profile data.

Step 2: Configuring Proxies to Prevent Rate Limiting

Instagram's rate-limiting can be a challenge. If you're scraping multiple profiles or making a lot of requests, you might get blocked. Proxies can help overcome this issue.
Proxies allow you to send requests through different IPs, masking your real location and preventing Instagram from flagging you.
Here's how to integrate proxies into your requests:

proxies = {
    'http': 'http://<proxy_username>:<proxy_password>@<proxy_ip>:<proxy_port>',
    'https': 'https://<proxy_username>:<proxy_password>@<proxy_ip>:<proxy_port>',
}

response = requests.get(f'https://i.instagram.com/api/v1/users/web_profile_info/?username={username}', headers=headers, proxies=proxies)

Step 3: Streamlining JSON Parsing with Box

Instagram's API returns a deep, nested JSON structure. Navigating this with standard dictionary syntax can be a hassle. This is where Box comes in.
Box turns JSON into an object that you can access with dot notation. It's cleaner, faster, and more intuitive.
Here's how to use it:

from box import Box

response_json = Box(response.json())

user_data = {
    'full name': response_json.data.user.full_name,
    'followers': response_json.data.user.edge_followed_by.count,
    'bio': response_json.data.user.biography,
    'is_verified': response_json.data.user.is_verified,
    'profile_pic': response_json.data.user.profile_pic_url_hd,
}

Instead of accessing data with response_json['data']['user']['full_name'], you can just write response_json.data.user.full_name. Much easier, right?

Step 4: Collecting Video and Post Data

Once you have the profile data, you can dig deeper into a user's posts and videos. Instagram gives you a goldmine of insights here: view counts, like counts, and more.
For videos, here's the extraction method:

profile_video_data = []
for element in response_json.data.user.edge_felix_video_timeline.edges:
    video_data = {
        'id': element.node.id,
        'video_url': element.node.video_url,
        'views': element.node.video_view_count,
        'likes': element.node.edge_liked_by.count,
    }
    profile_video_data.append(video_data)

Similarly, you can extract regular timeline posts (photos and videos) and pull data like media URL, comment counts, and like counts.

Step 5: Saving Data for Later Use

Once you've scraped the data, you'll likely want to save it. Use Python's built-in json library to export the data into readable JSON files for later analysis.

import json

# Save the profile data
with open(f'{username}_profile_data.json', 'w') as file:
    json.dump(user_data, file, indent=4)

# Save video data
with open(f'{username}_video_data.json', 'w') as file:
    json.dump(profile_video_data, file, indent=4)

Now, you have a neatly structured JSON file with all the data you need. Easy to read, easy to process.

Complete Code for Scraping Instagram Data

Here's the full Python script for your convenience:

import requests
from box import Box
import json

headers = {
    "x-ig-app-id": "936619743392459", 
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.94 Safari/537.36",
}

proxies = {
    'http': 'http://<proxy_username>:<proxy_password>@<proxy_ip>:<proxy_port>',
    'https': 'https://<proxy_username>:<proxy_password>@<proxy_ip>:<proxy_port>',
}

username = 'testtest'

response = requests.get(f'https://i.instagram.com/api/v1/users/web_profile_info/?username={username}', headers=headers, proxies=proxies)
response_json = Box(response.json())

user_data = {
    'full name': response_json.data.user.full_name,
    'followers': response_json.data.user.edge_followed_by.count,
}

# Save the data
with open(f'{username}_profile_data.json', 'w') as file:
    json.dump(user_data, file, indent=4)

Wrapping Up

Scraping Instagram data can seem daunting, but with the right tools, it's entirely doable. By leveraging Instagram's backend API, using headers to mimic a real browser, and applying proxies to avoid detection, you can scrape Instagram data from public profiles and gain valuable insights. Always ensure you're respecting Instagram's terms of service. Stay ethical and avoid overloading their servers with requests.

關於作者

SwiftProxy
Emily Chan
Swiftproxy首席撰稿人
Emily Chan是Swiftproxy的首席撰稿人,擁有十多年技術、數字基礎設施和戰略傳播的經驗。她常駐香港,結合區域洞察力和清晰實用的表達,幫助企業駕馭不斷變化的代理IP解決方案和數據驅動增長。
Swiftproxy部落格提供的內容僅供參考,不提供任何形式的保證。Swiftproxy不保證所含資訊的準確性、完整性或合法合規性,也不對部落格中引用的第三方網站內容承擔任何責任。讀者在進行任何網頁抓取或自動化資料蒐集活動之前,強烈建議諮詢合格的法律顧問,並仔細閱讀目標網站的服務條款。在某些情況下,可能需要明確授權或抓取許可。
Join SwiftProxy Discord community Chat with SwiftProxy support via WhatsApp Chat with SwiftProxy support via Telegram
Chat with SwiftProxy support via Email