pip install spotipy
Introduction
Spotify has a fantastic API enabling you to connect to its huge music database. With Spotify’s API, you can get insights about the music you listen to, use a powerful search engine, get access to Spotify’s amazing recommendation system, and integrate Spotify’s features into your apps.
Prerequisites
First, we’ll have to sign up at the official Spotify Developer Portal, head over to the Dashboard and get our ClientID and Client Secret. We’ll also have to register a redirect URI to our app (navigate to your application and then [Edit Settings]). The redirect URI can be any valid URI (it does not need to be accessible) such as http://example.com. We’ll use Spotipy, which is a lightweight Python library for the Spotify Web API.
If this is the first time you use it, you’ll have to install it with
Next, let’s import some necessary packages:
from spotipy.oauth2 import SpotifyOAuth
import spotipy
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
We should declare the important details for the API:
= "********"
client_id = "*********8"
client_secret = "http://localhost:1234" redirect_uri
Getting our top songs
Now we can write the API code that will fetch us our top songs and their attributes:
="user-top-read"
oauth_scopes= "long_term"
t_range
= SpotifyOAuth(client_id=client_id,
auth_manager =client_secret,
client_secret=redirect_uri,
redirect_uri=oauth_scopes)
scope= spotipy.Spotify(auth_manager=auth_manager)
sp
= sp.current_user_top_tracks(limit=50, time_range=t_range) user_top_tracks
This fetches us A LOT of data. We can organize the important details in a dictionary:
= {"track":[],"album":[],"artist":[],"ID":[],"popularity":[],"release_date":[],"duration_ms":[], "artist_id":[]}
top_tracks
for i in user_top_tracks["items"]:
"track"].append(i['name'])
top_tracks["album"].append(i['album']['name'])
top_tracks["artist"].append(i['artists'][0]['name'])
top_tracks["ID"].append(i['id'])
top_tracks["popularity"].append(i['popularity'])
top_tracks["release_date"].append(i['album']['release_date'])
top_tracks["duration_ms"].append(i['duration_ms'])
top_tracks["artist_id"].append(i['artists'][0]['id']) top_tracks[
and convert the dictionary into a pandas dataframe:
= pd.DataFrame.from_dict(top_tracks) df
Let’s send another API call to get the attributes of our top tracks:
= sp.audio_features(df['ID'].tolist()) audio_analysis
and convert it to a dataframe:
= pd.DataFrame.from_dict(audio_analysis) df2
which we will merge with the previous dataframe:
= pd.merge(df, df2, how='inner', left_on = 'ID', right_on = 'id') df_full
By now we have a nice dataframe of our top tracks and their musical attributes. Let’s send another API call to get the genres of our favorite artists, and combine them into our dataframe:
= []
genres
for artist in df_full['artist_id']:
'genres'])
genres.append(sp.artist(artist)[
'genres'] = genres
df_full[
df_full.head()
This dataframe looks like that (with your music..):
That’s a lot of data. Let’s plot our top ten tracks by popularity:
= df_full.iloc[:10].sort_values(by='popularity').plot(x='track', y='popularity', kind ="barh", figsize=(5, 5))
ax False)
ax.get_xaxis().set_visible(
ax.get_legend().remove()'top'].set_visible(False)
ax.spines['right'].set_visible(False)
ax.spines['bottom'].set_visible(False)
ax.spines['left'].set_visible(False)
ax.spines[
"")
plt.ylabel(
'My top10 tracks by Popularity')
plt.title(
0])
plt.bar_label(ax.containers[
plt.legend plt.show()
These are my top 10. Yeah…. I know 😂
We can also plot our top artists:
= df_full['artist'].value_counts(normalize=True) * 100
data
= data.sort_values().plot(kind="barh", figsize=(7, 3))
ax False)
ax.get_xaxis().set_visible(
'top'].set_visible(False)
ax.spines['right'].set_visible(False)
ax.spines['bottom'].set_visible(False)
ax.spines['left'].set_visible(False)
ax.spines[
'My top Artists (%)')
plt.title(0], fmt='%.0f%%')
plt.bar_label(ax.containers[ plt.show()
Which, in my case, looks like that:
Plotting our songs attributes is a little tricky:
= df_full[['danceability', 'energy', 'key', 'loudness', 'mode'
df_attributes 'speechiness', 'acousticness', 'instrumentalness', 'liveness',
'valence']]
for column in df_attributes.columns:
= df_attributes[column] / df_attributes[column].abs().max()
df_attributes[column]
= df_attributes.mean().sort_values(ascending=True).plot(kind="barh", figsize=(20, 10), )
ax False)
ax.get_xaxis().set_visible(
'top'].set_visible(False)
ax.spines['right'].set_visible(False)
ax.spines['bottom'].set_visible(False)
ax.spines['left'].set_visible(False)
ax.spines[
'My top songs attributes (normalized)').set_size(25)
plt.title(0])
plt.bar_label(ax.containers[ plt.show()
We normalize the data to be able to compare the different attributes. We can also plot a heatmap of the attributes of our top songs with Seaborn:
'track'] = df_full['track']
df_attributes[
= (22,22))
plt.figure(figsize
'track'), annot=True, linewidths=.5, cmap="RdYlGn")
sns.heatmap(df_attributes.set_index(
"Attributes of my top tracks").set_size(40) plt.title(
The heatmap helps us understand our musical taste:
It looks like my popular songs are more “danceable” and feature more “valence”. They also have almost zero “instrumentality” and little “liveness”. Let’s see what our favorite genres are:
= [item for sublist in df_full['genres'].tolist() for item in sublist]
flat_list
= pd.Series(flat_list)
x
= x.value_counts(normalize=True) * 100
data = data.plot.pie(y=data.values.tolist(), figsize=(5, 5), autopct='%.2f')
plot "")
plt.ylabel(
'My top Genres (%)')
plt.title( plt.show()
We can plot it using a wordcloud:
from wordcloud import WordCloud
from wordcloud import ImageColorGenerator
from wordcloud import STOPWORDS
= WordCloud(width=800, height=400, max_words=200).generate_from_frequencies(data)
wc
=(10, 10))
plt.figure(figsize='bilinear')
plt.imshow(wc, interpolation'My top Genres')
plt.title( plt.show()
Conclusion
In this post, I gave a brief overview of Spotify Web API’s methods using Spotipy and showed how the data from Spotify can be investigated and visualized. My next step would be to build my recommendation system data from Spotify. Stay tuned 😊
The notebook with code from this article is available here.