Analyze

Inspired by Spotify’s year in music feature (I wrote a post on it as well), I decided to analyze music related data that I had at my disposal. The data that I chose was the list of all the artists that I’ve seen live (78 at the time of doing this analysis).

There were two things that I wanted to surface from this data:

  1. Which genres of music have I seen the most live?
  2. Which artists should I see next, based on the artists I’ve already seen?

To answer both these questions I decided to use the Echo Nest API. And Python. All the code I wrote to analyze the data can be found here. I wrote this code when I should have been sleeping so the quality is not the best. Oh well.

About halfway through writing the code I decided that generating a word cloud for #1 would be cooler than simply listing the top genres. After failing miserably to get word_cloud working on my machine I decided to use an online word cloud generator instead. Here’s the resulting word cloud:

Screen Shot 2016-02-23 at 6.58.51 PM

The technique I used to answer #2 was to get the list of similar artists for each artist I’ve seen live, remove artists that I’ve already seen, and keep track of how many times each unseen artist is listed as a similar artist. Here are the top recommendations generated by my algorithm (format: <artist, number of times listed as similar artist>):

  1. Swedish House Mafia, 5
  2. The Raconteurs, 4
  3. Cut Copy, 3
  4. Beach Fossils, 3
  5. Kaiser Chiefs, 3
  6. Iron Maiden, 3
  7. Dio, 3
  8. Ellie Goulding, 2
  9. Black Sabbath, 2 (seeing them in September)
  10. Animals as Leaders, 2

My recommendation algorithm is extremely simple but produced surprisingly good results.

The Echo Nest API is incredible.

P.S. I tried using pyechonest but there didn’t seem to be a way to retrieve artist genre information which is why I decided to use their API directly. 

Listen

Spotify’s year in music feature is brilliant. They took something simple, namely the play count for an artist/song/genre, and interpreted it through different time filters to create a a very fun product. I’m a fan of the design of that page, especially the colors.

Here are some of my statistics from 2015:

Spotify_yim_en-US_Q_xLs6.jpg
I thought this number would be higher.
Spotify_yim_en-US_lJ1ql2
No surprises here. I’ve seen 4/5 of these bands live!
Spotify_yim_en-US_qRkpVp
Again, not surprising given my top artists.
Spotify_yim_en-US_T7Zcp9
I LOVE this song. I still listen to it at least once a day. I even wrote a blog post about it.

Quality

I loved Fred’s post on the Zen of Erlang. I decided to check out his blog on the bus ride back from work today and read a few of his other posts. Two posts stood out to me.

Lessons Learned while Working on Large-Scale Server Software is, in my mind, required reading for any software engineer working on backend and infrastructure systems. Knowledge of lot of the concepts mentioned in this post (like the CAP Theorem, or the Fallacies of distributed computingis essential in developing robust software systems. Fred’s style of writing is lots of fun to read, and I really his views on computer networks in this post —

There’s nothing more dangerous than someone going for a stroll over the network without knowing what it entails.

The network owes you nothing, and it doesn’t care about your feelings. It doesn’t deserve your trust.

The network is a necessary evil, not a place to expand to for fun.

The second post that stood out to me on how Queues Don’t Fix Overload. He explains in simple terms why queues (when used incorrectly) seem to solve your scaling problems in the short run while introducing a whole new class of problems of their own. As mentioned in the post, identifying bottlenecks in your system and scaling and fixing those is the correct way to deal with system overload.

Crash

Fred’s post on the Zen of Erlang is delightful. Fred (the author of ‘Learn You Some Erlang for Great Good!‘) does a fantastic job of explaining how Erlang embraces failure and crashes, and how it provides abstractions to deal with these so that the programer can focus on core application logic. Even if you don’t use Erlang the post is full of good software architecture patterns and principles that can be applied to any programming language and software project.

This post is making me question my decision of focussing solely on learning Rust this year.

 

Bisect

Have you ever been in a situation in which something has “gone wrong” (intentionally vague) between two git commits, say c1 and c2, and you’re trying to figure out which commit caused the issue? In other words, your code works fine at c1, but not at c2. Thus, a commit in the range (c1, c2] resulted in your code being in a “bad” (for some definition of “bad”) state.

One approach is to look at all the commits between (c1, c2] and see if any commit stands out as something that might have caused the issue. But there are times when looking at the changes is not enough, or it’s not clear why any of the changes would have broken anything, and you need to do some other work (run integration tests, run performance suites, run UI tests, etc.) in order to pinpoint the breaking commit.

“Why, this seems like a perfect opportunity to use binary search to figure out which commit caused a problem! All I need to do is a binary search in the range (c1, c2]. For a particular commit in this range (starting in the middle) I simply need to git checkout the code at that point, do whatever work I need to (explained above), and then make a decision on whether I need to search in the ‘upper half’ or ‘lower half'”

Enter git bisect. It allows you to focus on what went wrong, without having to manage the git + binary search state. In our scenario we’d simply mark c1 as a good commit, and c2 as a bad one, and then let git bisect work its magic in enabling us to discover what went wrong between (c1, c2].

I love git.

What I’m currently listening to: Massive Attack and White Moth Black Butterfly

Massive Attack’s newest album Ritual Spirit came out recently and it is full of gorgeous tunes.

White Moth Black Butterfly is the side project of Daniel Tompkins, one of my favorite vocalists. His voice is astounding and never ceases to amaze me.

Goal Tracking: January Edition

At the beginning of the year I published a post outlining what some of my goals for the year were. In the spirit of being transparent, here is the progress I made on them over the course of January –

  1. Due to personal reasons I don’t think I made myself as available as I could have. I’m sorry. 
  2. Zero volunteering done in January. I’m disappointed that I missed this goal in the very first month of the year.
  3. Made good progress on keeping the procrastination down to a minimum (I’d say almost zero procrastination).
  4. Was completely honest and open throughout the course of January. Almost painfully so.
  5. I realized that my Rust skills were a bit…rusty. I re-read the first two chapters of the Rust book.
  6. I read 4 books over the course of January: Seveneves (I really enjoyed it. The second half of the book was fantastic), The Sandman, Vol. 2: The Doll’s House (excellent. This series is so good!), The Complete Maus (no book has made me cry so much. This is my favorite book of all time), and Kafka on the Shore (very enjoyable and probably my favorite story by Murakami so far. I added numerous lines from this book to my list
  7. I read 3 research papers: Hekaton, f4, and Borg.
  8. I wrote 15 blog posts.
  9. Made good progress at playing the guitar. I started a separate website for tracking this.
  10. (a) Did not run an 8 min mile. In fact the best I did this month was 9m50s.
    (b) Made very good progress at doing a handstand. I’m fairly confident I should be able to do one by the end of February.
    (c) Managed to squat 285 lb. ~15 more lb to go!
    (d) Did not run any half marathons.
    (e) Managed to do one muscle up with +20 lb on me. Yay goal achieved!