(The worst part about jet lag is jet lag. The best part about jet lag is that it makes be very productive for some reason. Last year I read a book and a few research papers. This year I finished reading the “Red Book” while not being able to sleep according to the time zone I’m in)
As I’ve mentioned before, databases hold a special place in my heart. I think they’re incredibly interesting pieces of software. State of the art database systems that exist today are result of decades of research and systems engineering. The “Red Book” does a superb job in explaining how we got here, and where we might be going next.
The book is organized into chapters that deal with different components and related areas of database systems. The authors pick a few research papers that are significant in the chapter under discussion and then offer their commentary on them, as well as explain the content of the paper and talk about other related systems/papers/techniques/algorithms. The authors (Peter Bailis, Joseph M. Hellerstein, and Michael Stonebraker) have a lot of technical expertise in database systems which makes this book an absolute delight to read. I particularly enjoyed the personal anecdotes and commentaries that sprinkled throughout the book. My favorite chapters in the book were the ones on weak isolation and distribution and query optimization.
While reading this book I made note of all research papers that are referenced in this book that I would like to read next. I will be working on that list over the duration of my vacation.
For October’s InDay I had the opportunity to teach high school students how to code. This was the first time I was mentoring/teaching someone who had no prior programming experience.
One method that I found useful while teaching was to introduce the programming concepts with the help of the mathematical concepts on which they are built. Variables and functions in programming languages are (more or less) based on the same constructs in mathematics and it was easy to draw parallels between the two.
To explain how to generate a random number within a range I used the Google Chrome Console to show how function composition in programming works.
Overall this was a challenging and rewarding experience. I need to find more volunteering opportunities that are coding related.
A few weeks ago I hit my two year mark of working at LinkedIn. Just as I wrote about my first year at LinkedIn I thought it would be interesting to capture my feelings and thoughts at the end of two years and briefly chronicle what has happened since that post.
The biggest change that has happened in the past year is that I’ve moved to a different team. I started working at LinkedIn on the Service Infrastructure team where I worked primarily on Rest.li. Working on Rest.li was fun, and I learned a lot. I got to contribute to an open source project early on in my career which was incredible. I even got the opportunity to work on a Rest.li protocol upgrade for all our services, which was a non-trivial problem to solve. TL;DR — working on Rest.li was great. However, I wanted to learn what it was like to work on a lower layer of our technology stack. I’d heard very good things about LinkedIn’s distributed graph team and I knew they were working on solving interesting problems. I joined the distributed graph team in 2015 and I’m extremely happy with the decision that I made.
I had the opportunity to speak at a tech conference! Steven Ihde and I spoke at QCon 2014 on the evolution of LinkedIn’s service architecture. Never in my wildest dreams did I imagine that I would be speaking at a conference so early on in my career. It was an honor and a fantastic experience, and I can’t wait to do it again.
I mentored two interns (one in 2014 and the other in 2015) as part of LinkedIn’s summer internship program. Mentoring is a rewarding and challenging experience. I would recommend everyone try it at least once in their careers.
While this might be hard to quantify exactly, I feel that I’ve become a better software engineer. I’m more confident about the code that I write and the technical decisions that I make. Part of this confidence can definitely be attributed to the great people that I work with who have helped me grow and learn.
Oh and I got promoted. That was very awesome. 😀
I can’t wait to see what my future at LinkedIn holds.
(ouvert means open in French)
Last week I contributed a small feature to clize. As before, I discovered this project on the GitHub page for trending Python repositories. The author had a list of open issues for the repository which made it easy to see what needed to be worked on and I picked one that caught my fancy.
Once I knew what needed to be done I had to figure out how to implement it. The first thing I did was see how the existing code handled unknown command line arguments. “Oh look, it printed ‘Unknown option’! That seems like a good place to start.” I ran an
ack for the phrase “Unknown option” and found the relevant source code files. The next step was to figure out from where the parsed arguments lived inside the program. A well placed
print statement that I added quickly solved that mystery.
With this knowledge in hand I began writing some code. The basic algorithm was pretty simple – in case the user enters a command line argument that is not one of the parsed arguments compute the Levenshtein distance between what the user entered and the known arguments and suggest one that has the lowest distance. This was more or less the initial pull request that I submitted. The author provided excellent feedback on my code and after a couple of iterations my commit was merged into the master branch.
Things I learnt along the way –
(this is a review of the chapter on The NoSQL Ecosystem in the Architecture of Open Source Applications)
Unlike the other chapters in the book (and as stated in the introduction to the chapter), this portion of the book doesn’t dive deep into the internals of one particular project. Rather, it gives readers an overview of the various algorithms and concepts that serve as the building blocks for NoSQL systems like Voldemort, Cassandra, HBase, etc. I think it does a great job at explaining what is out there once one moves away from the traditional relational model and SQL world. It also references several seminal papers, like the Google BigTable and Amazon Dynamo papers, which I urge people to read if they are interested in understanding more about the topics covered in this chapter.
Speaking as someone who has read numerous papers on distributed and NoSQL systems (as well as studied them in several courses at UIUC) I feel like I didn’t gain a whole lot from reading this chapter. It was still a very enjoyable read and I really liked the sections that talked about
fsync, read repair, hinted handoff, and anti-entropy. The section on the differences between range-based and hash-based partitioning was excellent as well. One thing I particularly liked was the author’s use of examples to explain concepts like the relational model, range-based partitioning, hash rings, etc.
If you have zero or very little background in NoSQL systems I would highly recommend reading this chapter.
(this is a review of the chapter on Eclipse in the Architecture of Open Source Applications)
Eclipse was the second IDE (the first was Turbo C++) I was ever introduced to. We used it during our first undergraduate computer science course to code in Java. I remember being blown away by how powerful it was, and how easy it made learning a new language. Even though I’ve switched to using IntelliJ IDEA for Java/Scala now, Eclipse still holds a special place in my heart.
The chapter on Eclipse is very well written and offers readers a glimpse into why Eclipse is the way it is, and why certain design decisions (like writing their own Java compiler) were made. The component and plugin based architecture used by Eclipse seems to be extremely flexible and easy to add new features to. The compatibility layer provided by the team for each new major release of Eclipse (so that plugins written against the older versions still work in the new version) is a great move by the team to iterate on the internals (and externals in the case of public APIs) while preserving the ecosystem of plugins that already exist. I particularly enjoyed the section on “Rich Client Platform (RCP)” that talks about how people used portions of Eclipse to build other, non-IDE, applications. Incremental builds is one of my favorite features of Eclipse and learning how that worked was quite satisfying as well.
Things I learned about –
- Native and emulated widget toolkits
Key takeaways –
- Having a good API is paramount in others adopting or contributing to your software
- Components are very powerful
The video for the talk that Steven Ihde and I gave at QCon SF 2014 is now online. My section begins around the 18 minute mark.
I wrote a post for the LinkedIn engineering blog that talks about Rest.li 2.0, and how we upgraded ~100 services to the new protocol.