(ouvert means open in French)

Last week I contributed a small feature to clize. As before, I discovered this project on the GitHub page for trending Python repositories. The author had a list of open issues for the repository which made it easy to see what needed to be worked on and I picked one that caught my fancy.

Once I knew what needed to be done I had to figure out how to implement it. The first thing I did was see how the existing code handled unknown command line arguments. “Oh look, it printed ‘Unknown option’! That seems like a good place to start.” I ran an ack for the phrase “Unknown option” and found the relevant source code files. The next step was to figure out from where the parsed arguments lived inside the program. A well placed print statement that I added quickly solved that mystery.

With this knowledge in hand I began writing some code. The basic algorithm was pretty simple – in case the user enters a command line argument that is not one of the parsed arguments compute the Levenshtein distance between what the user entered and the known arguments and suggest one that has the lowest distance. This was more or less the initial pull request that I submitted. The author provided excellent feedback on my code and after a couple of iterations my commit was merged into the master branch.

Things I learnt along the way –


AOSA: The NoSQL Ecosystem Review

(this is a review of the chapter on The NoSQL Ecosystem in the Architecture of Open Source Applications)

Unlike the other chapters in the book (and as stated in the introduction to the chapter), this portion of the book doesn’t dive deep into the internals of one particular project. Rather, it gives readers an overview of the various algorithms and concepts that serve as the building blocks for NoSQL systems like Voldemort, Cassandra, HBase, etc. I think it does a great job at explaining what is out there once one moves away from the traditional relational model and SQL world. It also references several seminal papers, like the Google BigTable and Amazon Dynamo papers, which I urge people to read if they are interested in understanding more about the topics covered in this chapter.

Speaking as someone who has read numerous papers on distributed and NoSQL systems (as well as studied them in several courses at UIUC) I feel like I didn’t gain a whole lot from reading this chapter. It was still a very enjoyable read and I really liked the sections that talked about fsync, read repair, hinted handoff, and anti-entropy. The section on the differences between range-based and hash-based partitioning was excellent as well. One thing I particularly liked was the author’s use of examples to explain concepts like the relational model, range-based partitioning, hash rings, etc.

If you have zero or very little background in NoSQL systems I would highly recommend reading this chapter.

AOSA: Eclipse Review

(this is a review of the chapter on Eclipse in the Architecture of Open Source Applications)

Eclipse was the second IDE (the first was Turbo C++) I was ever introduced to. We used it during our first undergraduate computer science course to code in Java. I remember being blown away by how powerful it was, and how easy it made learning a new language. Even though I’ve switched to using IntelliJ IDEA for Java/Scala now, Eclipse still holds a special place in my heart.

The chapter on Eclipse is very well written and offers readers a glimpse into why Eclipse is the way it is, and why certain design decisions (like writing their own Java compiler) were made. The component and plugin based architecture used by Eclipse seems to be extremely flexible and easy to add new features to. The compatibility layer provided by the team for each new major release of Eclipse (so that plugins written against the older versions still work in the new version) is a great move by the team to iterate on the internals (and externals in the case of public APIs) while preserving the ecosystem of plugins that already exist. I particularly enjoyed the section on “Rich Client Platform (RCP)” that talks about how people used portions of Eclipse to build other, non-IDE, applications. Incremental builds is one of my favorite features of Eclipse and learning how that worked was quite satisfying as well.

Things I learned about –

  • OSGi
  • Native and emulated widget toolkits

Key takeaways –

  • Having a good API is paramount in others adopting or contributing to your software
  • Components are very powerful


I recently received a feature request for http://linkedin.github.io – one should be able to link to an external website hosting documentation for the project directly from the home page. I had some time tonight so I implemented this feature. It wasn’t hard to add at all, and it gave me a chance to exercise my Javascript muscles which have been dormant for some time now. I also identified sections in the codebase that I would like to refactor, and hope to get to doing that sometime within the next 2-3 months.

2015 goal achieved – contributing to an open source project

This week I managed to achieve one of my goals for 2015 – contributing to an open source project.

The project I contributed to was a command line interface to GitHub Gist. I found this project on the trending Python repositories page and decided to check it out (pun intended).

I think this was a great first project to contribute to. The code base was relatively small and extremely well organized and structured. The author also had a list of issues that needed addressing that made it very easy for someone to jump in and contribute to the project. I found an issue I felt I would be able to resolve, and after some back and forth with the author, my pull request for the issue was merged.

Things I learned –

This was a fantastic experience, and I can’t wait to find the next project to contribute to!

Paper: Read-Copy Update

With multicore machines now the norm, writing code that scales and performs well in multithreaded scenarios is becoming extremely important. I was thus delighted to discover the paper on Read-Copy Update – a technique to implement data structures with no locking on reads. Modifications (writes or deletes) still use some sort of locking mechanism, but the idea is that the ratio of reads to modifications is so large that by eliminating the overheads of synchronization on the read path we improve the performance of our code significantly.

This paper is extremely well written with lots of code samples. I specifically liked section 7 that compared read-copy update with other locking algorithms. Section 3 is also great because it allows one to answer the question “Can I use read-copy update for my data structure and its expected access pattern?”

Read-copy update is a simple (at least in terms of the general idea; the actual code implementation is tricky) and elegant solution for building high performance concurrent data structures (for certain usage patterns). It is definitely a topic I will be exploring further in the future.

The beginning

I was reading an essay by Paul Graham the other day and one line in it stood out to me –

Few people know so early or so certainly what they want to work on.

This got me thinking about what got me interested in programming and computer science when I was growing up.

As I kid I was always interested in computers and I remember reading Digit cover to cover each month. My first foray into programming was when my mother enrolled me in a class to learn basic C and C++. I rebelled initially, before even attending a single session of the class – “I don’t want to learn programming! That’s not cool at all.” I decided to attend at least one session though as I didn’t want to hurt her feelings. And I’m so glad I did that.

I think the phrase “love at first sight” accurately describes what I felt upon looking at my first program (a simple “Hello, World!” in C if my memory serves me right). The feeling was incredible and I knew almost right away that I had found my calling in life.

Thanks mom.