After 24+ hours of traveling I’m back in San Francisco! The long journey gave me a lot of time to think. And read. And sleep (I think my superpower is falling asleep on airplanes).

Here are some of the things I read:

Research Paper: “f4: Facebook’s Warm BLOB Storage System”

(second paper in my quest to distract myself)

f4: Facebook’s Warm BLOB Storage System introduces the reader to f4, a storage system designed and used at Facebook to store “warm” binary large objects (aka BLOBs). The term “warm” is used to denote the fact that these pieces of data are not as frequently accessed as “hot” BLOBs (which are stored in Haystack). The main motivation behind the design of f4 was the desire to lower the replication factor for warm BLOBs, while still maintaining the same fault tolerance guarantees (node, rack, and datacenter failure) that hot BLOBs have.

The first half of the paper dives into warm BLOBs and their characteristics (section 3) and also gives an overview on how Haystack works (section 4).

Section 5 dives into the details of f4. It explains the overall architecture of the system, how it leverages Reed-Solomon coding to reduce storage overhead (compared to raw replication), how the replication factor for BLOBs is 2.1 (compared to 3.6 in Haystack), how fault tolerance works, etc. The architecture section is very well written and does a good job of explaining the different types of nodes that comprise a f4 cell. My favorite section in the paper is the one that talks about fault tolerance (section 5.5); the “Quadruple Failure Example” in this section is extremely interesting and does a good job of showing how the system deals with failures at various levels. Another part of the paper that I really liked was the section on “Software/Hardware Co-Design” in section 5.6.

Overall this paper was fun to read and very interesting. It had been on my “To Read” list for quite some time now and I’m glad I finally got to it.

Research Paper: “Hekaton: SQL Server’s Memory-Optimized OLTP Engine”

(I’ve noticed that when I’m sad I tend to throw myself at whatever activity catches my fancy at the moment. I do this to distract myself, and in general this seems to work pretty well. To deal with my sadness this time around I will be reading research papers. And blogging. Here’s the first paper I read.)

(Hekaton was one of the systems mentioned in the “Red Book” that piqued my interest)

Hekaton: SQL Server’s Memory-Optimized OLTP Engine gives the reader an overview of Hekaton, a database engine that is a part of Microsoft SQL Server. It has been designed to work with data that fits entirely in main memory. The main motivation driving the design and implementation of Hekaton is the dropping cost of memory and the every growing popularity of multi-core CPUs. In order to achieve the best performance and to take full advantage of the multiple cores the Hekaton embraces a lock/latch-free design: all the index structures (a hash table and B-Tree) are lock/latch-free (details on the design are in [1] and [2]) and transactions use MVCC.

While the details of the implementation of the index data structures are in another paper, this paper does go into details of the MVCC design used and the garbage collection mechanism used to delete old records. Sections 6, 7, and 8 go into details of transactions, logging, and garbage collection. These sections are incredibly well written and do a great job of explaining these complex and core components of the system. The logging and checkpointing system is quite unique and I thought the non-usage (I’m sure there is a better term) of WAL is interesting. Section 8, which goes into details of the garbage collection mechanism used in Hekaton is definitely my favorite section in the paper. I think the GC algorithm is, simply put, beautiful.

Another unique aspect of the system: T-SQL queries and procedures are compiled down into native code to achieve high performance. Section 5 goes into the details of how this is done. What is interesting about this conversion process is that the generated code is one big function with labels and goto statements.

This was a great paper to begin 2016 with.


[1] Maged M. Michael. 2002. High performance dynamic lock- free hash tables and list-based sets. In Proceedings of the fourteenth annual ACM symposium on Parallel algorithms and architectures (SPAA ’02): 73-82.

[2] Levandoski, J.J.; Lomet, D.B.; Sengupta, S., “The Bw-Tree: A B-tree for new hardware platforms,” in Data Engineering (ICDE), 2013 IEEE 29th International Conference on , vol., no., pp.302-313, 8-12 April 2013

Research Paper: “AsterixDB: A Scalable, Open Source BDMS”

(AsterixDB was one of the systems mentioned in the “Red Book” that piqued my interest)

AsterixDB: A Scalable, Open Source BDMS gives the reader an overview of the AsterixDB system. AsterixDB is an impressive “big data management system” (BDMS) with several interesting features including a flexible data model, a powerful query language, data ingestion capabilities and distributed query execution. Two features that stood out to me were the ability to describe custom index types (B+-tree, R-tree, etc.) on your data, and the ability to query data that “lives” outside the system.

A majority of the paper is on the data definition and manipulation layer. The authors use an example of a social networking website to illustrate the power of AsterixDB’s data model and query language. Most of this section consists of code snippets (to define, load, and query the data) followed by an explanation of what exactly that snippet of code does, and what happens under the hood when that snippet is run. These code snippets make this section of the paper very easy to read and understand.

The data storage, indexing, and query execution components are described in the System Architecture section of the paper. These subsystems have separate papers ([1] and [2]) devoted to them; in this paper we are just given a brief overview of how they function and what their key features are. One piece of information that stood out to me in this section was the software layer described that grants any index data structure LSM update semantics. I thought this was a very novel idea to help speed up data ingestion and index building, while at the same time having the benefit of diverse index data structures based on the type of data being stored and indexed. The secondary index design is also interesting.

I really enjoyed reading this paper. I’ve added [1] and [2] to my “research papers to read next” list, and hope to get to it very soon.

[1] S. Alsubaiee, A. Behm, V. Borkar, Z. Heilbron, Y.-S. Kim, M. Carey, M. Dressler, and C. Li. Storage Management in AsterixDB. Proc. VLDB Endow., 7(10), June 2014.

[2] V. Borkar, M. Carey, R. Grover, N. Onose, and R. Vernica. Hyracks: A Flexible and Extensible Foundation for Data-intensive Computing. ICDE, 0:1151–1162, 2011.


(The worst part about jet lag is jet lag. The best part about jet lag is that it makes be very productive for some reason. Last year I read a book and a few research papers. This year I finished reading the “Red Book” while not being able to sleep according to the time zone I’m in)

As I’ve mentioned before, databases hold a special place in my heart. I think they’re incredibly interesting pieces of software. State of the art database systems that exist today are result of decades of research and systems engineering. The “Red Book” does a superb job in explaining how we got here, and where we might be going next.

The book is organized into chapters that deal with different components and related areas of database systems. The authors pick a few research papers that are significant in the chapter under discussion and then offer their commentary on them, as well as explain the content of the paper and talk about other related systems/papers/techniques/algorithms. The authors (Peter Bailis, Joseph M. Hellerstein, and Michael Stonebrakerhave a lot of technical expertise in database systems which makes this book an absolute delight to read. I particularly enjoyed the personal anecdotes and commentaries that sprinkled throughout the book. My favorite chapters in the book were the ones on weak isolation and distribution and query optimization.

While reading this book I made note of all research papers that are referenced in this book that I would like to read next. I will be working on that list over the duration of my vacation.


Over the weekend I read an interesting paper: How to Memorize a Random 60-Bit String. This was probably the first security paper I’d read in quite some time. I discovered this paper via The Morning Paper, which in my opinion is one of the best resources for people interested in computer science research papers. The title of the paper caught my eye and I decided to read the entire paper (this is typically my strategy with The Morning Paper; I tend to use it for “paper discovery”).

This paper builds on the password generation mechanism introduced by XKCD. It modifies XKCD’s approach to work for 60-bit long passwords and also proposes new schemes for English password generation based on random 60-bit strings. We use a lot of online services in our lives, and creating a secure + easy to remember password (especially if you use the same password for all the services. Please don’t do this.) is essential in order to safeguard ourselves.

While XKCD’s scheme is based on a simple dictionary lookup in order to generate a 4 letter phrase, the methods proposed in the paper are more involved. They all use n-gram language models in order to generate English word passwords that, while random, still seem to have the correct structure of a grammatically correct English sentence fragment or phrase.

I think the most impressive scheme proposed is the one in which a 60-bit string is converted into a poem. Yes, you read that right, a poem. This is done using word information from the CMU pronunciation dictionary + FST + FSA for their accepted poem structure. I thought this was the most interesting part of the paper. It was also the part that took me the longest to read.

The paper ends with a section on experiments showcasing user preference and recall for the difference generation schemes proposed. What is interesting is that even though the poetry scheme has the highest recall percentage it has the second lowest user preference percentage (the XKCD method has the lowest preference).

This paper was a fun, short read. I really enjoyed it!

Paper: Read-Copy Update

With multicore machines now the norm, writing code that scales and performs well in multithreaded scenarios is becoming extremely important. I was thus delighted to discover the paper on Read-Copy Update – a technique to implement data structures with no locking on reads. Modifications (writes or deletes) still use some sort of locking mechanism, but the idea is that the ratio of reads to modifications is so large that by eliminating the overheads of synchronization on the read path we improve the performance of our code significantly.

This paper is extremely well written with lots of code samples. I specifically liked section 7 that compared read-copy update with other locking algorithms. Section 3 is also great because it allows one to answer the question “Can I use read-copy update for my data structure and its expected access pattern?”

Read-copy update is a simple (at least in terms of the general idea; the actual code implementation is tricky) and elegant solution for building high performance concurrent data structures (for certain usage patterns). It is definitely a topic I will be exploring further in the future.