Instead of reading two research papers for the month of March I’ve decided instead to read as much as possible (yes I know that’s not a measurable quantity) of The Architecture of Open Source Applications: Volume 1. I’d read the LLVM and HDFS chapters in the past and was blown away but how well written they were. I’m pretty excited to read the rest of the book.
Tag: software engineering
2015 goal achieved – contributing to an open source project
This week I managed to achieve one of my goals for 2015 – contributing to an open source project.
The project I contributed to was a command line interface to GitHub Gist. I found this project on the trending Python repositories page and decided to check it out (pun intended).
I think this was a great first project to contribute to. The code base was relatively small and extremely well organized and structured. The author also had a list of issues that needed addressing that made it very easy for someone to jump in and contribute to the project. I found an issue I felt I would be able to resolve, and after some back and forth with the author, my pull request for the issue was merged.
Things I learned –
- docopt is great
- How to use Request Sessions
- How to package and distribute pip projects
- Python decorators are immensely powerful
This was a fantastic experience, and I can’t wait to find the next project to contribute to!
Paper: Read-Copy Update
With multicore machines now the norm, writing code that scales and performs well in multithreaded scenarios is becoming extremely important. I was thus delighted to discover the paper on Read-Copy Update – a technique to implement data structures with no locking on reads. Modifications (writes or deletes) still use some sort of locking mechanism, but the idea is that the ratio of reads to modifications is so large that by eliminating the overheads of synchronization on the read path we improve the performance of our code significantly.
This paper is extremely well written with lots of code samples. I specifically liked section 7 that compared read-copy update with other locking algorithms. Section 3 is also great because it allows one to answer the question “Can I use read-copy update for my data structure and its expected access pattern?”
Read-copy update is a simple (at least in terms of the general idea; the actual code implementation is tricky) and elegant solution for building high performance concurrent data structures (for certain usage patterns). It is definitely a topic I will be exploring further in the future.
New goal for 2015 – contribute to an open source project
Every year I decide that I will contribute to an open source project but somehow it always falls under the radar. This year I’m going to make a more determined push (pun very much intended) to make at least a small contribution to one of the many open source projects out there. GitHub Showcases is an excellent feature that should hopefully help me in this endeavor.
EDIT – By “contribute to an open source project” I meant something that I don’t work on during my job. So projects like Rest.li and the Rest.li API Hub don’t count 🙂 It has to be something I do in my free time.
The beginning
I was reading an essay by Paul Graham the other day and one line in it stood out to me –
Few people know so early or so certainly what they want to work on.
This got me thinking about what got me interested in programming and computer science when I was growing up.
As I kid I was always interested in computers and I remember reading Digit cover to cover each month. My first foray into programming was when my mother enrolled me in a class to learn basic C and C++. I rebelled initially, before even attending a single session of the class – “I don’t want to learn programming! That’s not cool at all.” I decided to attend at least one session though as I didn’t want to hurt her feelings. And I’m so glad I did that.
I think the phrase “love at first sight” accurately describes what I felt upon looking at my first program (a simple “Hello, World!” in C if my memory serves me right). The feeling was incredible and I knew almost right away that I had found my calling in life.
Thanks mom.
Paper Review – Paxos Made Live
I read an interesting article the other day that provided a great explanation, with sweet visualizations none the less, of the Paxos consensus algorithm. One of the papers mentioned in the “More Resources” section was Paxos Made Live. This paper has been on my radar for some time now and seeing it mentioned here inspired me to go ahead and read it.
I think this is one of my favorite papers. What I really liked about it is that it details the problems of translating an algorithm from an abstract theoretical concept into an actual living, breathing system.
Sections 5 and 6 of the paper are ridiculously good. Even something simple like taking a snapshot of a data structure has many subtleties associated with it when paired with a replicated log and I quite enjoyed reading about the snapshot mechanism the team had engineered. The idea of using a state machine language to implement their algorithm was excellent as well. One open source tool that comes to mind that allows you to do this now is Apache Helix.
Distributed systems are extremely hard to test. This is why the section on testing was particularly enlightening for me. I think having explicit hooks in source code to inject failures is quite a powerful idea. I particularly loved one line from this section – “By their very nature, fault-tolerant systems try to mask problems.”
I highly recommend reading this paper.
Jet lag, a book, research papers, and a dog
I reached Vadodara on the 23rd and was hit with jet lag that lasted about three days.
I’d bought Mira Jacob’s The Sleepwalker’s Guide to Dancing with the hope that it would last me a majority of my visit to India. I read ~100 pages of the book while in the U.S. On my first night in Vadodara I woke up around 3.30am (because of the afore mentioned jet lag) and finished the remaining ~400 pages in four hours. This is an incredible book, and I can see why it was a nominee for Goodreads’ Best Fiction books of 2014. Simply put, I loved it. And yes, I did appreciate the irony of the title given my battle with temporary-time-difference-induced sleeplessness.
With my reading plans in shambles (OK not completely in shambles since I am not yet done with Borges’ Labyrinths which is also amazing) I decided to work on one of my 2014 goals and read some research papers.
Here is what I read:
- Scaling Memcache at Facebook – This paper talks about how Facebook leveraged memcached to build a distributed key-value store. I thought the usage of UDP for making
getrequests and flow control mechanisms to combat incast was particularly interesting. Overall this is a very good paper and I would highly recommend it to anyone interested in distributed systems and system architecture. My favorite line in the paper? “Simplicity is vital.” (section 9).
- MultiQueues: Simpler, Faster, and Better Relaxed Concurrent Priority Queues – this paper introduces MultiQueues, a concurrent priority queue with relaxed semantics, i.e. you are not always guaranteed to get a globally minimal item from the data structure. Most of the papers I read are systems papers, so it was refreshing to read a paper that dealt more with data structures. MultiQueues are conceptually quite simple and their performance (as shown in section 6 of the paper) is impressive.
- You’re Doing It Wrong – this article introduces a B-heap, which is a VM page-friendly implementation of a binary heap. Kamp writes really well, and this article is a joy to read. My main takeaway from this article was the reminder that one should also consider I/O and memory access patterns while analyzing algorithms. This idea was introduced to me in CS-232 at UIUC and it is something that I always try to keep in mind while looking at an algorithm or trying to improve performance.
Oh, and here is the dog!
Go pipelines
As mentioned in this talk the main concurrency features Go provides are goroutines and channels. While these constructs might seem pretty simple they are immensely powerful and one can use them to build advanced systems. One such system is a pipeline as described in this blog post. What I think is really awesome about the system described is that downstream functions can signal to the upstream functions to stop producing values, while still keeping the code easy to understand and not horrendous.
Async servers and clients in Rest.li
Slightly modified slides for a talk I gave at LinkedIn on writing async servers and clients using Rest.li.
QCon SF 2014: Slides
QCon was an incredible conference, and I learned a lot. Can’t wait for QCon 2015.
