Crash

Fred’s post on the Zen of Erlang is delightful. Fred (the author of ‘Learn You Some Erlang for Great Good!‘) does a fantastic job of explaining how Erlang embraces failure and crashes, and how it provides abstractions to deal with these so that the programer can focus on core application logic. Even if you don’t use Erlang the post is full of good software architecture patterns and principles that can be applied to any programming language and software project.

This post is making me question my decision of focussing solely on learning Rust this year.

 

Bisect

Have you ever been in a situation in which something has “gone wrong” (intentionally vague) between two git commits, say c1 and c2, and you’re trying to figure out which commit caused the issue? In other words, your code works fine at c1, but not at c2. Thus, a commit in the range (c1, c2] resulted in your code being in a “bad” (for some definition of “bad”) state.

One approach is to look at all the commits between (c1, c2] and see if any commit stands out as something that might have caused the issue. But there are times when looking at the changes is not enough, or it’s not clear why any of the changes would have broken anything, and you need to do some other work (run integration tests, run performance suites, run UI tests, etc.) in order to pinpoint the breaking commit.

“Why, this seems like a perfect opportunity to use binary search to figure out which commit caused a problem! All I need to do is a binary search in the range (c1, c2]. For a particular commit in this range (starting in the middle) I simply need to git checkout the code at that point, do whatever work I need to (explained above), and then make a decision on whether I need to search in the ‘upper half’ or ‘lower half'”

Enter git bisect. It allows you to focus on what went wrong, without having to manage the git + binary search state. In our scenario we’d simply mark c1 as a good commit, and c2 as a bad one, and then let git bisect work its magic in enabling us to discover what went wrong between (c1, c2].

I love git.

Research Paper: “AsterixDB: A Scalable, Open Source BDMS”

(AsterixDB was one of the systems mentioned in the “Red Book” that piqued my interest)

AsterixDB: A Scalable, Open Source BDMS gives the reader an overview of the AsterixDB system. AsterixDB is an impressive “big data management system” (BDMS) with several interesting features including a flexible data model, a powerful query language, data ingestion capabilities and distributed query execution. Two features that stood out to me were the ability to describe custom index types (B+-tree, R-tree, etc.) on your data, and the ability to query data that “lives” outside the system.

A majority of the paper is on the data definition and manipulation layer. The authors use an example of a social networking website to illustrate the power of AsterixDB’s data model and query language. Most of this section consists of code snippets (to define, load, and query the data) followed by an explanation of what exactly that snippet of code does, and what happens under the hood when that snippet is run. These code snippets make this section of the paper very easy to read and understand.

The data storage, indexing, and query execution components are described in the System Architecture section of the paper. These subsystems have separate papers ([1] and [2]) devoted to them; in this paper we are just given a brief overview of how they function and what their key features are. One piece of information that stood out to me in this section was the software layer described that grants any index data structure LSM update semantics. I thought this was a very novel idea to help speed up data ingestion and index building, while at the same time having the benefit of diverse index data structures based on the type of data being stored and indexed. The secondary index design is also interesting.

I really enjoyed reading this paper. I’ve added [1] and [2] to my “research papers to read next” list, and hope to get to it very soon.

[1] S. Alsubaiee, A. Behm, V. Borkar, Z. Heilbron, Y.-S. Kim, M. Carey, M. Dressler, and C. Li. Storage Management in AsterixDB. Proc. VLDB Endow., 7(10), June 2014.

[2] V. Borkar, M. Carey, R. Grover, N. Onose, and R. Vernica. Hyracks: A Flexible and Extensible Foundation for Data-intensive Computing. ICDE, 0:1151–1162, 2011.

Red

(The worst part about jet lag is jet lag. The best part about jet lag is that it makes be very productive for some reason. Last year I read a book and a few research papers. This year I finished reading the “Red Book” while not being able to sleep according to the time zone I’m in)

As I’ve mentioned before, databases hold a special place in my heart. I think they’re incredibly interesting pieces of software. State of the art database systems that exist today are result of decades of research and systems engineering. The “Red Book” does a superb job in explaining how we got here, and where we might be going next.

The book is organized into chapters that deal with different components and related areas of database systems. The authors pick a few research papers that are significant in the chapter under discussion and then offer their commentary on them, as well as explain the content of the paper and talk about other related systems/papers/techniques/algorithms. The authors (Peter Bailis, Joseph M. Hellerstein, and Michael Stonebrakerhave a lot of technical expertise in database systems which makes this book an absolute delight to read. I particularly enjoyed the personal anecdotes and commentaries that sprinkled throughout the book. My favorite chapters in the book were the ones on weak isolation and distribution and query optimization.

While reading this book I made note of all research papers that are referenced in this book that I would like to read next. I will be working on that list over the duration of my vacation.

Teach

For October’s InDay I had the opportunity to teach high school students how to code.  This was the first time I was mentoring/teaching someone who had no prior programming experience.

The goal of the lesson was to help the student build a simple guessing game (pick a random number and have the user try to guess it; let the user know if their guess is greater than or lesser than the random number) in Javascript. On the surface this seems like an easy task; but if you think about it it involves using quite a few programming constructs — variables, loops, conditionals, functions, random number generators, etc.

One method that I found useful while teaching was to introduce the programming concepts with the help of the mathematical concepts on which they are built. Variables and functions in programming languages are (more or less) based on the same constructs in mathematics and it was easy to draw parallels between the two.

To explain how to generate a random number within a range I used the Google Chrome Console to show how function composition in programming works.

Also, analogies help. “Why do we need HTML, CSS, and Javascript?” “Well, HTML elements are the basic building blocks of a webpage. CSS allows you to add color to and position the blocks — it makes the blocks look pretty. Javascript makes the blocks interactive.” Yes, I’m well aware that this analogy is not perfect, but it was the best I could come up with at the time and I think captures the essence of why we need this trio to build websites today.

Overall this was a challenging and rewarding experience. I need to find more volunteering opportunities that are coding related.

Deux

A few weeks ago I hit my two year mark of working at LinkedIn. Just as I wrote about my first year at LinkedIn I thought it would be interesting to capture my feelings and thoughts at the end of two years and briefly chronicle what has happened since that post.

IMG_4093

The biggest change that has happened in the past year is that I’ve moved to a different team. I started working at LinkedIn on the Service Infrastructure team where I worked primarily on Rest.li. Working on Rest.li was fun, and I learned a lot. I got to contribute to an open source project early on in my career which was incredible. I even got the opportunity to work on a Rest.li protocol upgrade for all our services, which was a non-trivial problem to solve. TL;DR — working on Rest.li was great. However, I wanted to learn what it was like to work on a lower layer of our technology stack. I’d heard very good things about LinkedIn’s distributed graph team and I knew they were working on solving interesting problems. I joined the distributed graph team in 2015 and I’m extremely happy with the decision that I made.

I had the opportunity to speak at a tech conference! Steven Ihde and I spoke at QCon 2014 on the evolution of LinkedIn’s service architecture. Never in my wildest dreams did I imagine that I would be speaking at a conference so early on in my career. It was an honor and a fantastic experience, and I can’t wait to do it again.

IMG_4095

I mentored two interns (one in 2014 and the other in 2015) as part of LinkedIn’s summer internship program. Mentoring is a rewarding and challenging experience. I would recommend everyone try it at least once in their careers.

While this might be hard to quantify exactly, I feel that I’ve become a better software engineer. I’m more confident about the code that I write and the technical decisions that I make. Part of this confidence can definitely be attributed to the great people that I work with who have helped me grow and learn.

Oh and I got promoted. That was very awesome. 😀

I can’t wait to see what my future at LinkedIn holds.