登入

住宅代理

人工智慧

大規模收集數據

網頁抓取代理免費試用

在全球範圍內收集準確數據，無需擔心封鎖或中斷。

瞭解更多 >

適用於大規模視頻數據採集的無限帶寬代理解決方案

透過 Swiftproxy 強化您的業務成長

全球超過 8000 萬個住宅代理網絡，確保 99.89% 的運行時間和穩定連接，支持 HTTP(S) 和 SOCKS5 協議。

Swiftproxy residential proxies with 80M+ IPs, 99.89% uptime, supporting HTTP(S) & SOCKS5 protocols

How to Scrape Yandex Search Results Efficiently

By - Martin Koenig

2025-06-03 15:30:19

Scraping search engines like Yandex is no small feat. It's a task that comes with hurdles, especially if you're trying to do it at scale. But if you know how to handle the challenge, the rewards are immense. You get to unlock valuable data for SEO analysis, competitive research, and more. Ready to dive in?

In this guide, we'll walk you through building a custom Yandex scraper using proxies, and show you how to leverage API to extract Yandex search results with ease. No fluff, just practical steps that you can apply right away.

Exploring Yandex SERP

Yandex, like other major search engines, displays results based on relevance, quality, location, and personalization. The Yandex SERP is split into two main sections: Advertisements and Organic Results.

Let's imagine you searched for "iPhone". Here's how the results look:

Advertisements: These are clearly marked as "Sponsored" or "Advertisement" and show product details like prices and links.

Organic results: These are the pages that appear because they're most relevant to the query.

While ads are straightforward, scraping organic results can be tricky. However, Yandex is notorious for its anti-bot protection, especially the dreaded CAPTCHA. So, let's talk about how you can bypass it.

Why Scraping Yandex is Tough

Yandex doesn't make it easy. Their CAPTCHA and anti-bot system are designed to stop scrapers dead in their tracks. If you're not careful, you’ll find your IP blocked in no time. To make matters worse, Yandex continuously updates its anti-bot measures, forcing scrapers to constantly adapt.

But don't sweat it—there's a solution. Proxies and API are your best friends here. Proxies hide your real IP, making it look like the requests are coming from different users. The API takes this a step further, handling all the proxy and CAPTCHA issues for you.

Now, let's jump into the meat of the tutorial.

Configuring Your Environment

Before we get into scraping, let's make sure your environment is ready. You'll need Python installed on your system. If you haven't done that yet, head to the official Python website and grab the latest version.

Next, let's install the Python libraries we’ll use for this project: requests, BeautifulSoup, and pandas. Open your terminal and run this command:

pip install requests pandas beautifulsoup4

These libraries are the building blocks:

Requests: Makes network requests.

BeautifulSoup: Extracts the data you need from raw HTML.

Pandas: Saves the scraped data into a clean CSV file.

Scraping Yandex Using Proxies

This part is where the fun begins. We’ll build a basic scraper that uses residential proxies to bypass Yandex's CAPTCHA and IP blocks.

Step 1: Set Up Proxies and Headers

Here's how to configure your proxies and headers to mimic a real user.

import requests
from bs4 import BeautifulSoup
import pandas as pd

# Proxies and Authentication Details
USERNAME = 'PROXY_USERNAME'
PASSWORD = 'PROXY_PASSWORD'

proxies = {
    'http': f'https://{USERNAME}:{PASSWORD}@pr.swiftproxy.net:7777',
    'https': f'https://{USERNAME}:{PASSWORD}@pr.swiftproxy.net:7777'
}

# Request headers to mimic a browser
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:137.0) Gecko/20100101 Firefox/137.0',
    'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
    'Accept-Language': 'en-US,en;q=0.9,ru;q=0.8',
    'Connection': 'keep-alive'
}

Step 2: Send the GET Request

Next, we send a GET request to Yandex using the proxies and headers. This will fetch the search results.

response = requests.get(
    'https://yandex.com/search/?text=what%20is%20web%20scraping',
    proxies=proxies,
    headers=headers
)
response.raise_for_status()  # Ensure we get a successful response

Step 3: Parse the Data

Now, let's parse the raw HTML response to extract the search results. We’ll use BeautifulSoup to grab the title and link for each result.

soup = BeautifulSoup(response.text, 'html.parser')

data = []
for listing in soup.select('li.serp-item_card'):
    title_el = listing.select_one('h2 > span')
    title = title_el.text if title_el else None
    link_el = listing.select_one('.organic__url')
    link = link_el.get('href') if link_el else None

    data.append({'Title': title, 'Link': link})

Step 4: Export Results to CSV

Once you’ve extracted the data, it's time to save it to a CSV file. This step is easy with pandas.

df = pd.DataFrame(data)
df.to_csv('yandex_results.csv', index=False)

Scraping Yandex Using API

Building your own scraper works, but it can become a hassle when you need to scale. This is where API shines.

Step 1: Prepare Your Payload

You'll need to define the search parameters for the API. Here's a simple setup for scraping Yandex.

import requests
import pandas as pd

payload = {
    'source': 'universal',
    'url': 'https://yandex.com/search/?text=what%20is%20web%20scraping',
}

Step 2: Define Parsing Logic

API lets you define your parsing logic with CSS or XPath selectors. Let's extract the titles and links from the Yandex search results.

payload['parsing_instructions'] = {
    'listings': {
        '_fns': [{'_fn': 'css', '_args': ['li.serp-item_card']}],
        '_items': {
            'title': {'_fns': [{'_fn': 'css_one', '_args': ['h2 > span']}, {'_fn': 'element_text'}]},
            'link': {'_fns': [{'_fn': 'xpath_one', '_args': ['.//a[contains(@class, "organic__url")]/@href']}]}
        }
    }
}

Step 3: Send the POST Request

Send the request to the API.

response = requests.post(
    'https://realtime.swiftproxy.net/v1/queries',
    auth=('API_USERNAME', 'API_PASSWORD'),
    json=payload
)
response.raise_for_status()

Step 4: Export Data to CSV

Once the response is received, extract the data and save it to a CSV file.

data = response.json()['results'][0]['content']['listings']

df = pd.DataFrame(data)
df.to_csv('yandex_results_API.csv', index=False)

Choosing Your Scraping Approach

Approach	Advantages	Disadvantages
No Proxies	Simple setup, no proxy costs	IP blocks, CAPTCHA, scaling issues
With Proxies	Avoids IP blocks, access geo-specific data	Proxy service costs, maintenance
API	Scalable, automatic CAPTCHA bypass, no setup hassle	Recurring subscription costs, vendor lock-in
Custom Solutions	Full control, ideal for JavaScript-heavy sites	Requires technical expertise, can be slow

Conclusion

Scraping Yandex may seem like a challenge, but with the right tools and techniques, it's a breeze. Whether you're using proxies, API, or a custom scraper, you can bypass Yandex's anti-bot measures and extract the valuable data you need.

關於作者

Martin Koenig

商務主管

馬丁·科尼格是一位資深商業策略專家，擁有十多年技術、電信和諮詢行業的經驗。作為商務主管，他結合跨行業專業知識和數據驅動的思維，發掘增長機會，創造可衡量的商業價值。

Swiftproxy部落格提供的內容僅供參考，不提供任何形式的保證。Swiftproxy不保證所含資訊的準確性、完整性或合法合規性，也不對部落格中引用的第三方網站內容承擔任何責任。讀者在進行任何網頁抓取或自動化資料蒐集活動之前，強烈建議諮詢合格的法律顧問，並仔細閱讀目標網站的服務條款。在某些情況下，可能需要明確授權或抓取許可。

在這篇文章裏

頂級住宅代理解決方案