Residential Proxies

Static Residential Proxies

Unlimited Residential Proxies

Learn more

Youtube Proxies

Residential Proxies

Static Residential Proxies

Unlimited Residential Proxies

Data for AI

Web Scraping

SEO and SERP Scraping

Price Monitoring

Travel Fare Aggregation

Stock Market Data Collection

Swiftproxy’s partners

Gather data at scale

Web Scraping Proxies Free Trial

Gather accurate data worldwide without blocks or interruptions.

Learn more >

Unlimited-Bandwidth Proxy Solution for Large-Scale Video Data Collection

Power Your Business Growth with Swiftproxy

A global network of over 80 million residential proxies, ensuring 99.89% uptime and stable connections, supporting HTTP(S) & SOCKS5 protocols.

Swiftproxy residential proxies with 80M+ IPs, 99.89% uptime, supporting HTTP(S) & SOCKS5 protocols

Affiliate program

30% commission guaranteed

CDK Earning Program

Turn your proxies into profit

How to Scrape Vital YouTube Data for Performance Tracking

By - Martin Koenig

2025-01-23 15:29:31

YouTube hosts over 500 hours of video content uploaded every minute. For creators, analyzing their own content's performance—along with competitor videos—can be an overwhelming task. Manually sifting through all that data? Tedious. That's where automation steps in, especially with a well-crafted YouTube scraping script. Let's dive into building one from scratch.

What You Need to Get Started

To begin scraping, we need the right tools. Python's Selenium is a go-to for automating web browsers, but we'll need a few extra packages to make it all run smoothly. First up:
selenium-wire: A Selenium extension that lets us configure proxies (crucial for avoiding IP bans).
selenium: Standard tool for web automation.
blinker: To ensure smooth execution without runtime errors.
Install them using the command below:

pip install selenium-wire selenium blinker==1.7.0

Step 1: Importing the Essentials

Next, let's import the libraries that will drive our script.

from selenium.webdriver.chrome.options import Options  
from seleniumwire import webdriver as wiredriver  
from selenium.webdriver.common.keys import Keys  
from selenium.webdriver.common.by import By  
from selenium.webdriver.support.wait import WebDriverWait  
from selenium.webdriver.support import expected_conditions as EC  
from selenium.webdriver.common.action_chains import ActionChains  
import json  
import time

Here's what each one does:
selenium.webdriver: Interacts with web elements.
json: Converts our scraped data into a clean format.
time: Helps us add delays, so we don't look like a robot.
ActionChains: For mimicking real human scrolling and clicking behaviors.

Step 2: Setting Up Your Chrome Driver with a Proxy

YouTube's robots.txt file makes it clear—they're not fond of scraping. So, to avoid triggering anti-scraping measures, we need to route our requests through a proxy. Here's how:

Set up proxy credentials.

Add those credentials to the Chrome options.

Launch the browser using Selenium.

proxy_address = "your.proxy.address"  
proxy_username = "your-username"  
proxy_password = "your-password"  

chrome_options = Options()  
chrome_options.add_argument(f'--proxy-server={proxy_address}')  
chrome_options.add_argument(f'--proxy-auth={proxy_username}:{proxy_password}')  
driver = wiredriver.Chrome(options=chrome_options)

This setup ensures your script remains stealthy and doesn't attract unwanted attention.

Step 3: Navigating and Scraping Vital YouTube Data

With your proxy in place, we can now focus on extracting data. Here's the flow:

Open the Video URL: We'll fetch the video page URL and load it into the driver.

Wait for Page Elements: We'll use WebDriverWait to ensure that the elements we need are fully loaded before extracting any data.

Scroll and Load More: To get all comments, we simulate a user scrolling through the page.

youtube_url_to_scrape = "your_video_url"  
driver.get(youtube_url_to_scrape)

def extract_information() -> dict:  
    try:  
        element = WebDriverWait(driver, 15).until(  
            EC.presence_of_element_located((By.XPATH, '//[@id="expand"]'))  
        )  
        element.click()  

        time.sleep(10)  
        actions = ActionChains(driver)  
        actions.send_keys(Keys.END).perform()  # Scroll down  
        time.sleep(10)  
        actions.send_keys(Keys.END).perform()  # Scroll again  
        time.sleep(10)  

        video_title = driver.find_elements(By.XPATH, '//[@id="title"]/h1')[0].text  
        owner = driver.find_elements(By.XPATH, '//[@id="text"]/a')[0].text  
        total_number_of_subscribers = driver.find_elements(By.XPATH, "//div[@id='upload-info']//yt-formatted-string[@id='owner-sub-count']")[0].text  
        description = ''.join([i.text for i in driver.find_elements(By.XPATH, '//[@id="description-inline-expander"]/yt-attributed-string/span/span')])  

        # Additional details  
        publish_date = driver.find_elements(By.XPATH, '//[@id="info"]/span')[2].text  
        total_views = driver.find_elements(By.XPATH, '//[@id="info"]/span')[0].text  
        number_of_likes = driver.find_elements(By.XPATH, '//[@id="top-level-buttons-computed"]/segmented-like-dislike-button-view-model/yt-smartination/div/div/like-button-view-model/toggle-button-view-model/button-view-model/button/div')[1].text  

        # Scrape comments  
        comment_names = driver.find_elements(By.XPATH, '//[@id="author-text"]/span')  
        comment_content = driver.find_elements(By.XPATH, '//[@id="content-text"]/span')  

        comments = [  
            {"name": comment_names[i].text, "comment": comment_content[i].text}  
            for i in range(len(comment_names))  
        ]  

        data = {  
            'owner': owner,  
            'subscribers': total_number_of_subscribers,  
            'video_title': video_title,  
            'description': description,  
            'date': publish_date,  
            'views': total_views,  
            'likes': number_of_likes,  
            'comments': comments  
        }  

        return data  
    except Exception as err:  
        print(f"Error: {err}")

This function does all the heavy lifting—pulling video details, the owner’s stats, likes, views, and a collection of comments.

Step 4: Saving the Data into a Neat JSON File

Once we've gathered the data, the next step is saving it in an easy-to-use format: JSON.

def organize_write_data(data: dict):  
    output = json.dumps(data, indent=2, ensure_ascii=False).encode("ascii", "ignore").decode("utf-8")  
    try:  
        with open("output.json", 'w', encoding='utf-8') as file:  
            file.write(output)  
    except Exception as err:  
        print(f"Error encountered: {err}")

This function will neatly store everything into a output.json file, ready for analysis or further processing.

The Complete Code

Here's how everything ties together:

# Importing necessary packages  
from selenium.webdriver.chrome.options import Options  
from seleniumwire import webdriver as wiredriver  
from selenium.webdriver.common.keys import Keys  
from selenium.webdriver.common.by import By  
from selenium.webdriver.support.wait import WebDriverWait  
from selenium.webdriver.support import expected_conditions as EC  
from selenium.webdriver.common.action_chains import ActionChains  
import json  
import time  

# Proxy setup  
proxy_address = "your.proxy.address"  
proxy_username = "your-username"  
proxy_password = "your-password"  
chrome_options = Options()  
chrome_options.add_argument(f'--proxy-server={proxy_address}')  
chrome_options.add_argument(f'--proxy-auth={proxy_username}:{proxy_password}')  
driver = wiredriver.Chrome(options=chrome_options)  

# Target URL  
youtube_url_to_scrape = "your_video_url"  
driver.get(youtube_url_to_scrape)

def extract_information() -> dict:  
    # Scraping logic (refer to Step 3)  
    return data  

def organize_write_data(data: dict):  
    # Save scraped data to JSON  
    return output  

organize_write_data(extract_information())  
driver.quit()

Final Thoughts

Scraping vital YouTube data can be an incredibly powerful way to gain insights into what works—and what doesn't—in your content strategy. With the approach outlined above, you can build a tool that pulls in relevant data on video views, likes, comments, and more—all while staying under the radar. The key? Always use a proxy, respect YouTube's policies, and automate wisely.

About the author

Martin Koenig

Head of Commerce

Martin Koenig is an accomplished commercial strategist with over a decade of experience in the technology, telecommunications, and consulting industries. As Head of Commerce, he combines cross-sector expertise with a data-driven mindset to unlock growth opportunities and deliver measurable business impact.

The content provided on the Swiftproxy Blog is intended solely for informational purposes and is presented without warranty of any kind. Swiftproxy does not guarantee the accuracy, completeness, or legal compliance of the information contained herein, nor does it assume any responsibility for content on thirdparty websites referenced in the blog. Prior to engaging in any web scraping or automated data collection activities, readers are strongly advised to consult with qualified legal counsel and to review the applicable terms of service of the target website. In certain cases, explicit authorization or a scraping permit may be required.

IN THIS ARTICLE

Top-tier residential proxy solutions