How to Scrape Yandex Search Results Efficiently

SwiftProxy
By - Martin Koenig
2025-06-03 15:30:19

How to Scrape Yandex Search Results Efficiently

Scraping search engines like Yandex is no small feat. It's a task that comes with hurdles, especially if you're trying to do it at scale. But if you know how to handle the challenge, the rewards are immense. You get to unlock valuable data for SEO analysis, competitive research, and more. Ready to dive in?

In this guide, we'll walk you through building a custom Yandex scraper using proxies, and show you how to leverage API to extract Yandex search results with ease. No fluff, just practical steps that you can apply right away.

Exploring Yandex SERP

Yandex, like other major search engines, displays results based on relevance, quality, location, and personalization. The Yandex SERP is split into two main sections: Advertisements and Organic Results.

Let's imagine you searched for "iPhone". Here's how the results look:

Advertisements: These are clearly marked as "Sponsored" or "Advertisement" and show product details like prices and links.

Organic results: These are the pages that appear because they're most relevant to the query.

While ads are straightforward, scraping organic results can be tricky. However, Yandex is notorious for its anti-bot protection, especially the dreaded CAPTCHA. So, let's talk about how you can bypass it.

Why Scraping Yandex is Tough

Yandex doesn't make it easy. Their CAPTCHA and anti-bot system are designed to stop scrapers dead in their tracks. If you're not careful, you’ll find your IP blocked in no time. To make matters worse, Yandex continuously updates its anti-bot measures, forcing scrapers to constantly adapt.

But don't sweat it—there's a solution. Proxies and API are your best friends here. Proxies hide your real IP, making it look like the requests are coming from different users. The API takes this a step further, handling all the proxy and CAPTCHA issues for you.

Now, let's jump into the meat of the tutorial.

Configuring Your Environment

Before we get into scraping, let's make sure your environment is ready. You'll need Python installed on your system. If you haven't done that yet, head to the official Python website and grab the latest version.

Next, let's install the Python libraries we’ll use for this project: requests, BeautifulSoup, and pandas. Open your terminal and run this command:

pip install requests pandas beautifulsoup4

These libraries are the building blocks:

Requests: Makes network requests.

BeautifulSoup: Extracts the data you need from raw HTML.

Pandas: Saves the scraped data into a clean CSV file.

Scraping Yandex Using Proxies

This part is where the fun begins. We’ll build a basic scraper that uses residential proxies to bypass Yandex's CAPTCHA and IP blocks.

Step 1: Set Up Proxies and Headers

Here's how to configure your proxies and headers to mimic a real user.

import requests
from bs4 import BeautifulSoup
import pandas as pd

# Proxies and Authentication Details
USERNAME = 'PROXY_USERNAME'
PASSWORD = 'PROXY_PASSWORD'

proxies = {
    'http': f'https://{USERNAME}:{PASSWORD}@pr.swiftproxy.net:7777',
    'https': f'https://{USERNAME}:{PASSWORD}@pr.swiftproxy.net:7777'
}

# Request headers to mimic a browser
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:137.0) Gecko/20100101 Firefox/137.0',
    'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
    'Accept-Language': 'en-US,en;q=0.9,ru;q=0.8',
    'Connection': 'keep-alive'
}

Step 2: Send the GET Request

Next, we send a GET request to Yandex using the proxies and headers. This will fetch the search results.

response = requests.get(
    'https://yandex.com/search/?text=what%20is%20web%20scraping',
    proxies=proxies,
    headers=headers
)
response.raise_for_status()  # Ensure we get a successful response

Step 3: Parse the Data

Now, let's parse the raw HTML response to extract the search results. We’ll use BeautifulSoup to grab the title and link for each result.

soup = BeautifulSoup(response.text, 'html.parser')

data = []
for listing in soup.select('li.serp-item_card'):
    title_el = listing.select_one('h2 > span')
    title = title_el.text if title_el else None
    link_el = listing.select_one('.organic__url')
    link = link_el.get('href') if link_el else None

    data.append({'Title': title, 'Link': link})

Step 4: Export Results to CSV

Once you’ve extracted the data, it's time to save it to a CSV file. This step is easy with pandas.

df = pd.DataFrame(data)
df.to_csv('yandex_results.csv', index=False)

Scraping Yandex Using API

Building your own scraper works, but it can become a hassle when you need to scale. This is where API shines.

Step 1: Prepare Your Payload

You'll need to define the search parameters for the API. Here's a simple setup for scraping Yandex.

import requests
import pandas as pd

payload = {
    'source': 'universal',
    'url': 'https://yandex.com/search/?text=what%20is%20web%20scraping',
}

Step 2: Define Parsing Logic

API lets you define your parsing logic with CSS or XPath selectors. Let's extract the titles and links from the Yandex search results.

payload['parsing_instructions'] = {
    'listings': {
        '_fns': [{'_fn': 'css', '_args': ['li.serp-item_card']}],
        '_items': {
            'title': {'_fns': [{'_fn': 'css_one', '_args': ['h2 > span']}, {'_fn': 'element_text'}]},
            'link': {'_fns': [{'_fn': 'xpath_one', '_args': ['.//a[contains(@class, "organic__url")]/@href']}]}
        }
    }
}

Step 3: Send the POST Request

Send the request to the API.

response = requests.post(
    'https://realtime.swiftproxy.net/v1/queries',
    auth=('API_USERNAME', 'API_PASSWORD'),
    json=payload
)
response.raise_for_status()

Step 4: Export Data to CSV

Once the response is received, extract the data and save it to a CSV file.

data = response.json()['results'][0]['content']['listings']

df = pd.DataFrame(data)
df.to_csv('yandex_results_API.csv', index=False)

Choosing Your Scraping Approach

Approach

Advantages

Disadvantages

No Proxies

Simple setup, no proxy costs

IP blocks, CAPTCHA, scaling issues

With Proxies

Avoids IP blocks, access geo-specific data

Proxy service costs, maintenance

API

Scalable, automatic CAPTCHA bypass, no setup hassle

Recurring subscription costs, vendor lock-in

Custom Solutions

Full control, ideal for JavaScript-heavy sites

Requires technical expertise, can be slow

 

Conclusion

Scraping Yandex may seem like a challenge, but with the right tools and techniques, it's a breeze. Whether you're using proxies, API, or a custom scraper, you can bypass Yandex's anti-bot measures and extract the valuable data you need.

Note sur l'auteur

SwiftProxy
Martin Koenig
Responsable Commercial
Martin Koenig est un stratège commercial accompli avec plus de dix ans d'expérience dans les industries de la technologie, des télécommunications et du conseil. En tant que Responsable Commercial, il combine une expertise multisectorielle avec une approche axée sur les données pour identifier des opportunités de croissance et générer un impact commercial mesurable.
Le contenu fourni sur le blog Swiftproxy est destiné uniquement à des fins d'information et est présenté sans aucune garantie. Swiftproxy ne garantit pas l'exactitude, l'exhaustivité ou la conformité légale des informations contenues, ni n'assume de responsabilité pour le contenu des sites tiers référencés dans le blog. Avant d'engager toute activité de scraping web ou de collecte automatisée de données, il est fortement conseillé aux lecteurs de consulter un conseiller juridique qualifié et de revoir les conditions d'utilisation applicables du site cible. Dans certains cas, une autorisation explicite ou un permis de scraping peut être requis.
Join SwiftProxy Discord community Chat with SwiftProxy support via WhatsApp Chat with SwiftProxy support via Telegram
Chat with SwiftProxy support via Email