by

You Wanted A Hit?

And so you wanted a hit

Well this is how we do hits

You wanted a hit

But that’s not what we do

You Wanted a Hit” – LCD Soundsystem

A couple of years ago I went to Germany. I wandered around various cities, ran the Berlin Marathon, backpacked in the Alps, and went to Oktoberfest. All through the trip I had LCD Soundsystem’s “You Wanted a Hit” bopping around in my head. In a sonic coincidence, on my last night in Germany I heard the band in the Old Wiesn tent playing my song of the trip. There’s something special about hearing an earworm out in the wild, but it’s something else entirely to hear a fairly deep cut from an American act played by a German brass band dressed in traditional Bavarian attire.

Each day, Spotify releases a list of the top 200 most-streamed songs – in the spirit of James Murphy, I wanted to see if there were any specific trends in the songs that consistently dominate the Spotify global charts. More specifically: Is there a recipe to making a hit? What are the key ingredients? How diverse is our musical exposure on Top 40 radio? What types of songs become hits? Can we arrange these songs into any coherent groups?

To try to answer these questions, I pulled the global top 200 songs every day from January 1, 2017 to March 28, 2021. In this nearly 51 month period, there were 10,149 unique songs. Some of these songs are newer to the list, like “Montero (Call Me By Your Name)” by Lil Nas X or Bad Bunny’s “DÁKITI”, and some have been on the charts nearly every day of this timeframe – James Arthur’s “Say You Won’t Let Go” and Ed Sheeran’s “Shape of You” reach the list on all but about 50 days.

One of the ways that Spotify keeps track of its song library and makes recommendations is through the use of track “features.” Some of these features are assigned using an algorithm that analyzes a song and gives it a rating in seven categories. These categories are:

“acousticness: A confidence measure from 0.0 to 1.0 of whether the track is acoustic. 1.0 represents high confidence the track is acoustic.

danceability: Danceability describes how suitable a track is for dancing based on a combination of musical elements including tempo, rhythm stability, beat strength, and overall regularity. A value of 0.0 is least danceable and 1.0 is most danceable.

energy: Energy is a measure from 0.0 to 1.0 and represents a perceptual measure of intensity and activity. Typically, energetic tracks feel fast, loud, and noisy. For example, death metal has high energy, while a Bach prelude scores low on the scale. Perceptual features contributing to this attribute include dynamic range, perceived loudness, timbre, onset rate, and general entropy.

instrumentalness: Predicts whether a track contains no vocals. “Ooh” and “aah” sounds are treated as instrumental in this context. Rap or spoken word tracks are clearly “vocal”. The closer the instrumentalness value is to 1.0, the greater likelihood the track contains no vocal content. Values above 0.5 are intended to represent instrumental tracks, but confidence is higher as the value approaches 1.0.

liveness: Detects the presence of an audience in the recording. Higher liveness values represent an increased probability that the track was performed live. A value above 0.8 provides strong likelihood that the track is live.

speechiness: Speechiness detects the presence of spoken words in a track. The more exclusively speech-like the recording (e.g. talk show, audio book, poetry), the closer to 1.0 the attribute value. Values above 0.66 describe tracks that are probably made entirely of spoken words. Values between 0.33 and 0.66 describe tracks that may contain both music and speech, either in sections or layered, including such cases as rap music. Values below 0.33 most likely represent music and other non-speech-like tracks.

valence: A measure from 0.0 to 1.0 describing the musical positiveness conveyed by a track. Tracks with high valence sound more positive (e.g. happy, cheerful, euphoric), while tracks with low valence sound more negative (e.g. sad, depressed, angry)”

https://developer.spotify.com/documentation/web-api/reference/#object-audiofeaturesobject

The hope is that these attributes can provide a quantitative means of determining the main types of hit songs and what ingredients are commonly present in each. For example, what’s more important to a song being a hit: being easy to dance to (high danceability) or sounding happier (high valence)? What’s the sweet spot for how acoustic a hit song should be? Are some types of hits more common than others?

Genres?

To take a first crack at seeing if there were any dominant themes or types of songs that were consistently showing up on the charts, I wanted to see if different genres of songs had starkly different scores of the Spotify attributes mentioned above.

In addition to song attributes, Spotify labels artists with up to 5 genres. To assign a genre to each song, I pulled the genre information assigned to each artist on a track and combined it with my list of top songs. This isn’t perfect, since the genre is defined by the artist, not the particular song, but I think this gets us close enough.

In looking at these genres, two major ones emerge: pop and rap. However, because Spotify has so many variations of the same thing (Dua Lipa is classified as “dance pop”, “pop”, “pop dance”, and “uk pop” and Drake is “canadian hip hop”, “canadian pop”, “hip hop”, “pop rap”, and “rap”) I ran a search to identify the presence of just “pop” and “rap” in any of the artist’s genres.

Of the 10,149 unique songs, 5514 have artists categorized as pop and 4563 have artists categorized as rap. As seen in Drake’s example above, some artists would qualify in both genres. Additionally, 2262 tracks have artists that aren’t categorized as either pop or rap. Furthermore, the genres assigned to each artist aren’t perfect: Pop Smoke’s sole genre is “brooklyn drill”, which wouldn’t be counted as rap; as a result, we’re missing all of his songs in the rap category. Other genres that appear frequently but would be excluded are variations of “rock”, “reggaeton”, “hip hop”, and “house”. Still, with nearly 5000 songs in each pop or rap set we should have plenty of tracks to give us a reasonable representation of what kind of attributes are typical of these genres.

Because we have seven total attributes to describe these songs, it can be very difficult to represent each song graphically. Since we live in a 3D world and are analyzing data on a 2D screen, plotting this data in 7-dimensional space is impossible. However, I still wanted to represent each song’s attributes on a single chart. To accomplish this, I used a polar plot with each of the seven attributes represented by an axis on the plot. To make viewing the data easier, I also scaled the attribute scores up by a factor of 100 so that they’d be represented by an integer from 0-100 instead of a decimal from 0-1. The end result is a plot that looks like the one shown below, for Ed Sheeran’s “Shape of You.” The song has high (>80) scores for danceability and valence, medium-high (~60) scores for acousticness and energy, and low (<10) instrumentalness, liveness, and speechiness. As usual, I’ll be using the Python packages seaborn/matplotlib to generate these figures.

Polar plot of “Shape of You” attributes

When I take the average attributes of all pop and rap songs on the charts and plot them the same way, we get the chart shown below. It’s clear right away that the average pop and rap songs aren’t that different. Both score ~70 for danceability, ~60 for energy, ~40 for valence, and at or below 20 in all other attributes. The only difference that seems to be worth mentioning is that rap songs have speechiness scores that are about 10 points higher, which supports the commentary in the documentation.

Polar plot of pop and rap mean attributes

All in all, there didn’t appear to be much of a difference between the average pop and rap song. This means that as far as Spotify attributes are concerned, artist genre isn’t an effective means of differentiating between different groups of hit songs.

Cluster Analysis?

With genres appearing to be a dead end, I decided to turn to cluster analysis. There are some very good resources that I used to better understand what this analysis is and how to perform it. The first thing I did was plot all of the 7 attributes in a pair plot. A pair plot is a grid of plots “such that each numeric variable in data will be shared across the y-axes across a single row and the x-axes across a single column. The diagonal plots are treated differently: a univariate distribution plot is drawn to show the marginal distribution of the data in each column.” For example, acousticness is the y-axis in the entire top row of plots and the six remaining attributes serve as the x-axis for columns 2-7, creating a 2-D scatter plot. In the first column, the x-axis would be acousticness as well and so a histogram of all of the songs’ acousticness scores are shown. This continues all through the grid, with histograms of the 7 attributes along the diagonal and scatter plots comparing two of the attributes symmetrically filling in the off-diagonal plots. There are no obvious clusters or groupings of points in any of these scatter plots, but that only indicates that clusters may not exist when considering just two of the attributes – more may reveal themselves when more dimensions are considered. For all of the following cluster analyses, I’ll be using the Python package scikit-learn.

Pair plot with no clustering

Next, I needed to determine the right number of clusters to best represent the data. Keep in mind that each cluster ideally will represent a grouping of songs that have similar attributes and hopefully therefore sound the same or have similar musical themes. So, in effect, we’re asking the computer to tell us how many different groups of songs exist within this top hits dataset. In the elbow plot below, we plot the Within-Cluster-​Sum of Squared Errors (WSS) vs. number of clusters. The lower the WSS, the more accurately the clusters captures all of the data. There is a noticeable kink in the elbow plot at around 5 clusters.

Elbow plot to determine number of clusters

The “silhouette score” can also be used a means of finding the ideal number of clusters for an analysis. According to Sewell and Rousseau, a score below 0.25 represents that no meaningful cluster has been found. Unfortunately, the results fall below that mark, indicating that there are no clear clusters found in the data. Despite this, we’ll continue on to see how these clusters end up looking.

This image has an empty alt attribute; its file name is image-5.png
Silhouette scores for different numbers of clusters

We can see that the silhouette score is highest with 3 clusters, and second highest with 5. Because the silhouette score for 3 and 5 clusters is about the same and the elbow plot above seems to prefer 5 clusters to 3, I’ll proceed with 5. The k-means++ clustering technique was used to group hits in this analysis. We can plot the song attribute data on a pair plot just like the one shown before. This is is the exact same data as above, but now each song is color-coded based on the cluster it belongs to.

Pair plot with color-coded clusters

Aside from looking really cool, the above figure highlights some interesting trends. For example, cluster 2 (shown in green) is consistently on the upper region of liveness scores. This can be seen on the scatter plots in the fifth column or in its histogram. Additionally, it is clear that a majority of the songs have low instrumentalness scores, indicating that vocals are present in most of these hits.

Another way to compare differences between clusters is to find the centroid of each cluster and plot these attributes on the polar plot from before. The centroid is the point that is closest to all of the other points in the cluster and is basically the average value of a certain attribute for all songs in the cluster. The cluster analysis seems to have done a better job of grouping the songs than the genre study. This analysis showed much clearer differences between song groups than the songs grouped by genre. This is evident from comparing the differences in attributes between pop and rap songs to the differences between the 5 clusters. This could also be due to the fact that the cluster analysis was allowed to create 5 different groups while the genre check could only assign songs to one of two groups. Still, the cluster analysis seems to help get us closer to our goal of grouping hits into certain themes.

Polar plot of cluster centroid attributes

Looking at the chart above, we can see that there is one cluster that is quite different than the others – cluster 1. Where all the other clusters have danceability and energy scores at or above 60, cluster 1 is lower, with danceability around 50 and energy below 40. Additionally, cluster 1 has the highest acousticness score, nearly reaching 80. The speechiness and instrumentalness scores are not markedly different than the other clusters, but cluster 1’s valence score is a good deal lower than all clusters except cluster 4. So, to sum all of that up, cluster 2 consists of songs that are slower, sadder, less danceable, and more acoustic – we’ll call these the ballads. This is confirmed when looking at some of the most-streamed songs in this cluster, including Billie Eilish’s “everything i wanted”, Olivia Rodrigo’s “drivers license” (i love gen z), and John Legend’s “All of Me”.

As mentioned before, the remaining four clusters are much more similar to each other than they are to cluster 1. In general, these clusters are almost identical, with the exception of one attribute which is higher or lower than the others. To go alphabetically, clusters 0, 2, and 4 all have acousticness scores around 10 while cluster 3 is higher, reaching about 50. There’s less difference in the clusters’ danceability score, with all above 60, but cluster 0 tops out the others, reaching nearly 80. Again, all 4 clusters left have higher energy scores than cluster 1, but cluster 2 has the highest, scoring a 75. Similarly, cluster 3, the most acoustic of the remaining clusters, has the lowest energy (~60). All of the clusters are very low in instrumentalness, which makes sense since most popular songs today have significant singing. In addition to cluster 2 having the highest energy it also has the highest liveness, scoring in the upper 40’s while all others are in the teens. Finally, cluster 0 has the highest valence score (70) while cluster 4 is the lowest (30).

With all of this in mind, the remaining 4 clusters can be ordered:

Looking at these results, I think that the cluster analysis was very successful in identifying one clear song type and did a so-so job on grouping the others. Cluster 1 (the ballads) stands in stark contrast to the other clusters in terms of its attribute scores. These scores are significantly different than those of the other clusters for multiple attributes and when I reviewed the top 20 most-streamed songs in this group, all passed the sniff test as true ballads.

The analysis breaks down a bit with the remaining clusters. As mentioned before, the analysis mostly just sorted songs based on a single attribute for clusters 0, 2, 3, and 4. Furthermore, not all of the top 20 most-streamed songs in these groups really matched a clear trend to my ears. For example, there is a big difference between “Watermelon Sugar”, “God’s Plan”, and “Don’t Stop Me Now”, all of which are top tracks in cluster 2. Additionally, the inter-cluster variance between attributes was much greater for these clusters than cluster 1’s. Still, there were some success stories. Cluster 0, the group with the highest valence scores (and therefore supposedly the happiest sounding songs) includes Pharrel Williams’ “Happy.” Likewise, cluster 4, the low valence/sad songs features XXXTENTACION’s “SAD!”

clusteracousticnessdanceabilityenergyinstrumentalnesslivenessvalencespeechiness
0 = happy 1175721157014
1 = ballads 755135316299
2 = lively 1263751485115
3 = acoustic5170591165914
4 = sad 1167621153013
Table of cluster centroid attributes

Conclusions

In general, today’s top hits tend to have higher energy and danceability, almost always feature vocals, have very little spoken word, are usually not recorded (or sound like they’re recorded) live, and are skewed towards being less acoustic. As seen in the unclustered pair plot shown earlier, only valence actually produces a normal distribution, meaning that the only real variance within the top charts is whether the song will be more or less happy-sounding.

To circle back to the original questions I set out to answer, it seems that in today’s hits there are ballads and then there’s everything else. There is little difference between pop and rap songs and they tend to evenly fill out the non-ballads; there are about the same number of pop and rap songs in each of the clusters. To call these all pop songs seems like a cop-out, but these are the most-streamed songs of the last 3+ years – they are, by definition, popular music. Rap has risen in prominence to become the dominant genre in pop culture, and there are constant collaborations, cross-pollinations, and straight-up ripoffs between today’s artists. The non-ballads all have mostly the same features, with the difference being a little more or less of some attribute. There are happier pop songs and sadder ones, tracks that are more acoustic hits and tracks that are more electronic, but aside from these slight differences they’re mostly the same in the eyes of the algorithm. Maybe there’s chicken instead of steak here, maybe this one’s spicier than that one, but at the end of the day, we’re all just eating tacos.

To be honest, at first I was pretty disappointed in these results. I was hoping that there would be clear delineations between the major types of songs that I hear in popular music. I expected the cluster analysis on song attributes to yield clear groupings of pop, rap, reggaetón, EDM, and indie tracks, and to provide a quantitative definition of what each of these songs tend to look like. Ultimately, I don’t think this is the fault of the cluster analysis, the blame lies more with the song attribute scores – the clustering is only as effective as the data that it is trying to group, and if there aren’t major differences between the data it will struggle to yield sensible data groupings.

I’m not bashing the Spotify attribute scores either. The algorithm assigning these values has to work for all the songs on the platform, an unbelievably wide range of music. It can’t be developed with the granularity necessary to effectively tell the difference between the drill or trap beats heard in today’s rap and the dem bow rhythm that is ubiquitous in top 40 reggaetón. The algorithm doesn’t seem to be perfectly consistent either; I found that it assigned slightly different attribute scores to two identical versions of Post Malone’s “Sunflower” – the only difference between the tracks are their ID numbers, otherwise they are exactly the same song. However, even the fact that it can effectively distinguish between drastically different songs is quite impressive, see below for a polar plot comparing Erik Satie’s “Gymnopédie No. 1”, Cannibal Corpse’s “Hammer Smashed Face”, and the inspiration for this post.

Polar plot comparing the attributes of three very different tracks

It is easy to hear the difference between these three tracks, but to rank the level of instrumentalness between “Gymnopédie No. 1”, which features no vocals at all, and Corpsegrinder‘s guttural croaking on “Hammer Smashed Face” is impossible. As a tool, Spotify features seem to be useful for categorizing wildly different types of songs, but are not refined enough to accurately separate today’s pop songs. I’m still happy with the ballads cluster that the analysis yielded and I can appreciate the difficulty in finding other meaningful groupings on today’s top charts. However, I don’t have much to report back to Mr. Murphy and his band about making hits, even if he insists “that’s not what we do.”


No one asked, but here they are: my favorite 10 songs so far this year

Big Bang – Cherry Glazerr

Blouse – Clairo

Cartwheel – Lucy Dacus

Cloudy Shoes – Skullcrusher

Crying Wolf – Julien Baker

Do I Ever Cross Your Mind – Justin Townes Earle

From The Back of a Cab – Rostam

Olympus – Sufjan Stevens

Stay in the Car – Bachelor

SUN GOES DOWN – Lil Nas X

WILSHIRE – Tyler, The Creator