Papir

(This post is a summary of two papers I have recently read. Papir is the Norwegian word for paper)

Real-Time Twitter Recommendation: Online Motif Detection in Large Dynamic Graphs is a paper that was presented at VLDB 2016. It combines two of my favorite topics, distributed systems and graph theory, into a short (2 pages!) paper. It presents a simplified version of the algorithm that Twitter uses to detect motifs in real-time in a user’s social graph, which is then used to generate recommendations for the user. One thing I liked about this paper is that it presents naive solutions to the problem at hand before diving into the elegant solution that Twitter uses. The paper then presents their solution to the problem, and explains how it works at Twitter scale by graph partitioning, pruning, and offline data structure generation.

Design patterns for container-based distributed systems is a paper by Google that talks about software design patterns that are emerging from software systems that are built around containers. Software like Docker and CoreOS has made working with containers easier, and more and more companies are moving towards a container based ecosystem. Google was one of the first companies to use containers, and this paper contains design and architecture patterns that they have observed in their container based systems. The design patterns presented are grouped under three main categories of which I enjoyed reading about “Multi-node application patterns” the most. This sections deals with design patterns in distributed systems, where each node holds multiple related containers (called “pods” in the paper). It was interesting to read about how distributed system problems like leader election, scatter-gather, etc. can be dealt with language agnostic containers rather than by language specific libraries. I loved this line from the end of the paper, which made me think of containers in an entirely new light:

In all cases, containers provide many of the same benefits as objects in object-oriented systems, such as making it easy to divide implementation among multiple teams and to reuse components in new contexts. In addition, they provide some benefits unique to distributed systems, such as enabling components to be upgraded independently, to be written in a mixture of languages, and for the system a whole to degrade gracefully.

 

Oxide

I just finished sections 24 to 36 of chapter 4 of the Rust book. Here’s what I felt:

    • Associated types seem like an improvement over generics. They seem like an important concept to write effective Rust code and I wish that this chapter had gone into more details and had a larger example.
    • Rust supports macros. As the chapter mentions, I probably wouldn’t write one unless I absolutely had to. If Rust supported variable number of arguments to functions one could probably implement vec! using that plus generics.
    • unsafe seems like a very powerful and tricky Rust feature. I wish the chapter had an actual example that demonstrated how to use unsafe correctly. And also an example of when not to use unsafe — for example when you’re writing bad Rust code and using unsafe to mask a bad design.

(You can find my thoughts on the previous chapters / sections here)

Ferric

I spent the past few days working through the first 3 chapters and the first 23 sections of chapter 4 of the Rust book. Here are my initial thoughts on Rust:

    • Cargo is pretty sweet. It’s an easy to understand build and dependency management system. Even though I’m only using it for simple things so far I’m really happy with it and have not run into any issues. I’ve also gotten very used to running cargo new <project_name> [--bin] to start new Rust projects.
    • Compared to Go, Rust is a much larger language, with many more concepts to learn. Go is a simple language to learn. I feel that Rust has a steeper learning curve.
    • Memory safety is one of Rust’s strongest selling points. It is one of the trickier concepts to understand and is unlike anything I’ve experienced in C, C++, Java, Go, Python, etc. I’d say the concepts that come close to resembling it are unique_ptr and shared_ptr in C++11. Consequently I spent the most time on the three sections dedicated to references and borrowing, lifetimes, and mutability. Most of the bugs that I ran into while writing Rust code were also related to these concepts.
    • Rust has generics. This was something I missed in Go.
    • I haven’t gotten used to writing functions without an explicit return yet.
    • The Rust book is very well written but there are a few areas of improvement. My major gripe is that it introduces new language constructs but doesn’t explain what they do. For instance the chapter on trait objects introduces the format! macro for the first time without explaining what it does, the chapter on closures uses a Box to return a closure from a function without going into what exactly a Box is etc.

 

Intelligence

Inspired by a tutorial on TensorFlow that was on HN recently I decided to go and read the TensorFlow paper. This paper has been sitting in my “To Read” folder for quite some time now but for various reasons I never got around to reading it. This is also the first AI/ML paper I’ve read in 2016 so I was excited to dive right in.

At 19 pages long this is one of the longest papers I’ve read. But it is extremely well written, with lots of diagrams, charts, and code samples interspersed throughout the text that make this paper fun to read.

The basic idea of TensorFlow, to have one system that can work across heterogenous computing platforms to solve AI/ML problems, is incredibly powerful. I fell in love with the directed graph API used by TensorFlow to describe computations that will run on it (this may or may not be related to the fact that I also love graph theory). The multi-device (and distributed) execution algorithm explained in the paper is quite intuitive and easy to understand. A major component of multi device / distributed execution of the TensorFlow graph is deciding which device to place a node on. While the paper does explain the algorithm used in section 3.2.1 I wish they had gone into more details and talked about what graph placement algorithms didn’t work, details about the greedy heuristic used, etc.

Sections 5, 6, and 7 were my favorite portions of the paper. Section 5 dives into some of the performance optimizations used in TensorFlow. It would have been awesome if the authors had given more details about the scheduling algorithm used to minimize memory and network bandwidth consumption. I would have also liked to know what other scheduling optimizations were used in TensorFlow as I find scheduling algorithms very interesting.

Section 6 talks about the experience of porting the Inception model over to TensorFlow. While the strategies mentioned in this section are specific to machine learning systems, I feel that some of them can be tweaked a little bit to be generally applicable to all software systems. For instance

“Start small and scale up” (strategy #2)

is directly applicable to any software system. Similarly,

“Make a single machine implementation match before debugging a distributed implementation” (strategy #4)

Can be rephrased as

“Make a single machine implementation work before debugging a distributed implementation”

and be generally applicable to building distributed systems.

Section 7 explains how TensorFlow can be used to speed up stochastic gradient descent (SGD). Again, while the idioms presented in this section are used to speed up SGD, I feel that they are general purpose enough where they can be applied to other algorithms/systems as well. The diagrams in this section are amazing and do a great job of illustrating the differences between the various parallelism and concurrency idioms.

EEG, the internal performance tool mentioned in the paper, sounds very interesting. While it is probably not in the scope of a paper that focuses on TensorFlow I’d love to learn more about EEG. It seems like a very powerful tool and could probably be extended to work with other systems as well.

The paper ends with a survey of related systems. This section proved to be a valuable source for finding new AI/ML and systems papers to read.

I loved this paper.

 

 

Frequent

(inspired by this post)

Here is a (definitely incomplete) list of programming / technology related websites (in no particular order) that I frequently read —

  • High Scalability: Excellent articles on real world software architecture and design.
  • Julia Evan’s Blog: Well written and fun to read posts on a wide range of topics from system programming to machine learning.
  • Preshing on Programming: A great resource for data structures and C++.
  • The Morning Paper: Research papers. Research papers. Research papers.
  • Hacker News: I feel this is the best resource to stay up-to-date on what’s happening in the fields of technology, software engineering, and computer science.
  • Dan Luu’s blog: I hope to one day be able to write technical posts as well as they do.
  • LWN.net: Incredible posts about system engineering concepts.

Think

(working through more Hacker News backlog)

I just finished Think OS: A Brief Introduction to Operating Systems. It had been sitting in my browser for quite some time now and I decided to read through the book this afternoon.

The book stays true to its title and gives a short and sweet introduction to (some) OS and systems programming fundamentals. While it doesn’t dive deep into any topic (for example — TLBs are not mentioned in the chapter about Virtual Memoryit does a good job of laying a foundation for people who have never taken a systems programming / OS course before and are trying to gain an understanding of the fundamentals. That being said there are links to resources (like an explanation on how dlmalloc is implemented) scattered throughout the book that allow the reader to gain a deeper understanding of the topic at hand.

If you’ve never done any systems programming before and are looking to write some low level code this book is a good place to begin to understand how the OS and kernel work. If you’re looking for something more in-depth I’d highly recommend reading Operating Systems: Three Easy Pieces.

TOAST

The other day I was working through my (quite massive) backlog of saved Hacker News stories and read this gem — Introduction to PostgreSQL physical storage. As someone who has loved databases forever, and has spent over a year building LinkedIn’s next generation distributed graph database, I found this post absolutely fascinating.

As the title suggests this article talks about how data in PostgreSQL tables and databases are actually stored on disk and how free space (to figure out where to store incoming data) is managed. The diagram of the page structure is very helpful in understanding how data is stored in a page. I also really liked the use of PostgreSQL queries used throughout the article to explain the topic at hand by examining a real PostgreSQL instance. The author does a great job at explaining concepts at just the right amount of detail, with several links provided for those interesting in learning more.

(Relevant side bar: the PostgreSQL source code documentation is amazing)