What’s in a genre?

September 07 2012

I’m taking a course on data mining this semester. Our first assignment: mine some data. The dataset and techniques don’t matter; the point is to extract meaning in any way possible. I’m greenhorn data miner; hopefully I’ll be able to look back at this post and laugh at my own naivete.

For my dataset I chose my own Google Music library. It’s unique, big enough (7600+ songs), and well organized. Plus, it’s a cinch to access with my Google Music api.

My analysis was simple: I investigated the occurrences of words in genres. I figured the most frequent words would be genres themselves (eg ‘metal’ in ‘power metal’), but there was also the chance of exposing common adjectives (eg ‘post’ in ‘post-rock’ and ‘post-metal’).

A few lines of Python later, and I had my results. The first thing I noticed: I listen to a lot of metal. A third of my songs are some kind of metal. If you put all the genre words into a hat, you’d pick ‘metal’ almost a quarter of the time. Next up: ‘rock’ and ‘jazz’. Rounding out the top six are two adjectives - ‘alternative’ and ‘progressive’ - as well as ‘accompaniment’ (as in Jamey Aebersold).

Metal bands also claim the longest genres in my library. Novembre is the champion, boasting this mouthful: ‘Progressive atmospheric doom metal’.

Now that we’ve figured out I’m a jazz-playing metalhead, let’s take a look at the least common words. ‘Country’ appears only once (and at the risk of sounding one-sided, it labels the fantastic Slaughter of the Bluegrass). There’s a bunch of mispellings, too, like ‘reggaer’ and ‘sountrack’.

Here’s a quick chart of all the words I found:

This assignment turned out to be a surprising amount of fun. For any other music lovers who want to take a dive into their libraries, I’ve got the source on GitHub.


Subscribe to future posts via email or rss, or view all posts.