Images aren’t just decoration—they’re data. They power machine learning models, enhance research, and bring projects to life. But collecting them manually? A tedious, time-sucking nightmare. What if you could automate the whole process, fetching hundreds of images in minutes instead of hours? That’s exactly what we’ll cover. We’ll show you how to scrape Google Images with Python—step by step. By the end, you’ll have a repeatable, scalable way to collect high-quality visuals without breaking a sweat.

Before diving into code, let's get real about Google Images. It's not a static gallery; it's a dynamic beast. When you search, only a few thumbnails appear. Scroll down, and more images load—but behind the scenes via JavaScript.
That means a simple requests.get() call won't cut it. To grab everything, you need tools that can handle JavaScript: think Selenium or Playwright.
Install the tools:
pip install requests beautifulsoup4 selenium pandas
If you go the Playwright route:
pip install playwright
playwright install
And don't forget a web driver for Selenium. Chrome? Grab ChromeDriver that matches your browser version.
Even without JavaScript, you can grab thumbnails. Start small.
import requests
from bs4 import BeautifulSoup
query = "golden retriever puppy"
url = f"https://www.google.com/search?q={query}&tbm=isch"
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64)"}
response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.text, "html.parser")
images = soup.find_all("img")
for i, img in enumerate(images[:5]):
print(f"{i+1}: {img['src']}")
You'll mostly get thumbnails or base64 images—but it's a starting point.
For higher-quality images, you need to mimic human scrolling.
from selenium import webdriver
from selenium.webdriver.common.by import By
import time
query = "golden retriever puppy"
url = f"https://www.google.com/search?q={query}&tbm=isch"
driver = webdriver.Chrome()
driver.get(url)
for _ in range(3):
driver.execute_script("window.scrollBy(0, document.body.scrollHeight);")
time.sleep(2)
images = driver.find_elements(By.TAG_NAME, "img")
for i, img in enumerate(images[:10]):
print(f"{i+1}: {img.get_attribute('src')}")
driver.quit()
Now you're capturing the real visuals as they load dynamically.
Once you have URLs, saving them is straightforward:
import os
import requests
save_dir = "images"
os.makedirs(save_dir, exist_ok=True)
for i, img_url in enumerate(images[:10]):
try:
img_data = requests.get(img_url).content
with open(os.path.join(save_dir, f"img_{i}.jpg"), "wb") as f:
f.write(img_data)
print(f"Saved img_{i}.jpg")
except Exception as e:
print(f"Could not save image {i}: {e}")
Boom. Images stored locally and ready to use.
If you scrape too aggressively, Google notices. IP blocks and CAPTCHAs appear fast. Stay safe:
Add random delays between requests.
Rotate headers and user agents.
Use proxy servers for IP rotation.
Example with requests:
proxies = {
"http": "http://username:password@proxy_host:proxy_port",
"https": "http://username:password@proxy_host:proxy_port"
}
response = requests.get(url, headers=headers, proxies=proxies)
Services like Swiftproxy handle proxy rotation automatically. No headache, no downtime.
Google detects bots quickly. Manual solving kills automation. Mitigation? Slow your requests, rotate headers, use headless browsers, and rotate IPs.
Thumbnails aren't enough. Selenium scrolling, clicking on thumbnails, and waiting for images to load solves this.
Automation is key. Retry failed requests, save metadata to avoid duplicates, and use residential proxies for large datasets.
Organize by query to simplify workflows, especially for ML:
import os
def save_image(content, folder, filename):
os.makedirs(folder, exist_ok=True)
with open(os.path.join(folder, filename), "wb") as f:
f.write(content)
Keep URLs, file paths, timestamps in a CSV or database:
import pandas as pd
data = {"url": image_urls, "filename": [f"img_{i}.jpg" for i in range(len(image_urls))]}
df = pd.DataFrame(data)
df.to_csv("images_metadata.csv", index=False)
For massive datasets, think AWS S3 or Google Cloud Storage. Combine with DVC to version-control updates efficiently.
Scraping Google Images is simple for small projects but tricky at scale. Requests throttling, user-agent rotation, headless browsers, and proxies are necessary. Master these, and you'll have a reliable, automated pipeline to build datasets like a pro.