
Music streaming giants like Spotify host a goldmine of data. Imagine unlocking insights from millions of playlists — track names, artists, durations — all at your fingertips. You can. With Python.
Scraping Spotify playlists isn't just for hobbyists. Analysts, developers, and music app creators can leverage this to build smarter apps, spot trends, or power data-driven features. However, you need to do it right. Legally. Efficiently.
This guide walks you through everything — from installing the right tools to extracting playlists, handling authentication, and saving your data for analysis. Ready? Let's dive in.
First, grab the essentials. Open your terminal and run:
pip install beautifulsoup4 selenium requests
Here's the deal:
BeautifulSoup is your go-to for parsing static HTML pages. It slices through the code to find exactly what you want—like track names or artist info.
Selenium handles the dynamic stuff. Spotify's playlist pages load content as you scroll, and Selenium mimics user behavior: clicking, scrolling, waiting. Without it, you'd miss loads of data.
Requests is a lightweight way to talk to Spotify's official API. It handles your GET and POST calls seamlessly when you just need the data without page interaction.
Selenium can't do much without a browser driver. Think of ChromeDriver as the remote control for your browser.
Download ChromeDriver from its official site.
Extract it.
Note the path to the driver executable — you’ll need it in your script.
Here's a quick test snippet to check it works:
from selenium import webdriver
driver_path = "C:/webdriver/chromedriver.exe" # Update with your path
driver = webdriver.Chrome(driver_path)
driver.get("https://google.com")
print("Browser launched successfully!")
driver.quit()
If Chrome opens and hits Google, you're good to go.
Spotify's web pages structure tracks in identifiable HTML elements. Hit F12 in your browser and look for something like:
<div class="tracklist-row">
<span class="track-name">Song Title</span>
<span class="artist-name">Artist Name</span>
<span class="track-duration">3:45</span>
</div>
To scrape:
Load the playlist with Selenium.
Scroll down to ensure all tracks load dynamically.
Grab the HTML source.
Parse with BeautifulSoup.
Extract the track title, artist, and duration.
Here's a streamlined Python function to do just that:
from selenium import webdriver
from bs4 import BeautifulSoup
import time
def get_spotify_playlist_data(playlist_url):
options = webdriver.ChromeOptions()
options.add_argument("--headless") # Run without UI for speed
driver = webdriver.Chrome(options=options)
driver.get(playlist_url)
time.sleep(5) # Let the page load fully
# Scroll to bottom to load all tracks
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
time.sleep(2) # Allow new content to load
html = driver.page_source
driver.quit()
soup = BeautifulSoup(html, "lxml")
tracks = []
# Note: update these class names if Spotify changes their site
for track in soup.find_all(class_="IjYxRc5luMiDPhKhZVUH UpiE7J6vPrJIa59qxts4"):
name = track.find(class_="e-9541-text encore-text-body-medium encore-internal-color-text-base btE2c3IKaOXZ4VNAb8WQ standalone-ellipsis-one-line").text
artist = track.find(class_="e-9541-text encore-text-body-small").find('a').text
duration = track.find(class_="e-9541-text encore-text-body-small encore-internal-color-text-subdued l5CmSxiQaap8rWOOpEpk").text
tracks.append({"track title": name, "artist": artist, "duration": duration})
return tracks
Pass a Spotify playlist URL to this function, and you’ll get a neat list of dictionaries with all the juicy details.
If you want cleaner data and guaranteed access, use Spotify's API. But it requires authentication. Here's the gist:
Register your app on the Spotify Developer Dashboard.
Get your Client ID and Client Secret.
Use them to request an access token.
Example Python snippet for getting the token:
import requests
import base64
CLIENT_ID = "your_client_id"
CLIENT_SECRET = "your_client_secret"
credentials = f"{CLIENT_ID}:{CLIENT_SECRET}"
encoded_credentials = base64.b64encode(credentials.encode()).decode()
url = "https://accounts.spotify.com/api/token"
headers = {
"Authorization": f"Basic {encoded_credentials}",
"Content-Type": "application/x-www-form-urlencoded"
}
data = {"grant_type": "client_credentials"}
response = requests.post(url, headers=headers, data=data)
token = response.json().get("access_token")
print("Access Token:", token)
With this token, you can query Spotify's API endpoints directly:
artist_id = "6qqNVTkY8uBg9cP3Jd7DAH" # Example artist: Billie Eilish
url = f"https://api.spotify.com/v1/artists/{artist_id}"
headers = {"Authorization": f"Bearer {token}"}
response = requests.get(url, headers=headers)
artist_data = response.json()
print(artist_data)
Don't lose your hard-earned data. Save it in JSON or CSV for analysis or integration into other apps.
Here's saving scraped tracks to JSON:
import json
playlist_url = "https://open.spotify.com/album/7aJuG4TFXa2hmE4z1yxc3n?si=W7c1b1nNR3C7akuySGq_7g"
data = get_spotify_playlist_data(playlist_url)
with open('tracks.json', 'w', encoding='utf-8') as f:
json.dump(data, f, ensure_ascii=False, indent=4)
print("Saved playlist data to tracks.json")
Use the API when you can. It's official, stable, and respects Spotify's terms.
Throttle your requests. Don't bombard Spotify's servers — add delays to avoid getting blocked.
Check robots.txt. It tells you what's allowed.
Avoid excessive scraping. If data is behind login or restricted, respect the rules.
Use proxies sparingly to prevent IP bans if scraping is absolutely necessary.
Spotify data scraping is powerful but requires finesse. Use BeautifulSoup for static parsing, Selenium for dynamic loading, and the Spotify API for official, structured access. Combine these tools thoughtfully and you'll turn raw Spotify playlists into actionable, analyzable data in no time.