Research Paper: “f4: Facebook’s Warm BLOB Storage System”

(second paper in my quest to distract myself)

f4: Facebook’s Warm BLOB Storage System introduces the reader to f4, a storage system designed and used at Facebook to store “warm” binary large objects (aka BLOBs). The term “warm” is used to denote the fact that these pieces of data are not as frequently accessed as “hot” BLOBs (which are stored in Haystack). The main motivation behind the design of f4 was the desire to lower the replication factor for warm BLOBs, while still maintaining the same fault tolerance guarantees (node, rack, and datacenter failure) that hot BLOBs have.

The first half of the paper dives into warm BLOBs and their characteristics (section 3) and also gives an overview on how Haystack works (section 4).

Section 5 dives into the details of f4. It explains the overall architecture of the system, how it leverages Reed-Solomon coding to reduce storage overhead (compared to raw replication), how the replication factor for BLOBs is 2.1 (compared to 3.6 in Haystack), how fault tolerance works, etc. The architecture section is very well written and does a good job of explaining the different types of nodes that comprise a f4 cell. My favorite section in the paper is the one that talks about fault tolerance (section 5.5); the “Quadruple Failure Example” in this section is extremely interesting and does a good job of showing how the system deals with failures at various levels. Another part of the paper that I really liked was the section on “Software/Hardware Co-Design” in section 5.6.

Overall this paper was fun to read and very interesting. It had been on my “To Read” list for quite some time now and I’m glad I finally got to it.

Research Paper: “Hekaton: SQL Server’s Memory-Optimized OLTP Engine”

(I’ve noticed that when I’m sad I tend to throw myself at whatever activity catches my fancy at the moment. I do this to distract myself, and in general this seems to work pretty well. To deal with my sadness this time around I will be reading research papers. And blogging. Here’s the first paper I read.)

(Hekaton was one of the systems mentioned in the “Red Book” that piqued my interest)

Hekaton: SQL Server’s Memory-Optimized OLTP Engine gives the reader an overview of Hekaton, a database engine that is a part of Microsoft SQL Server. It has been designed to work with data that fits entirely in main memory. The main motivation driving the design and implementation of Hekaton is the dropping cost of memory and the every growing popularity of multi-core CPUs. In order to achieve the best performance and to take full advantage of the multiple cores the Hekaton embraces a lock/latch-free design: all the index structures (a hash table and B-Tree) are lock/latch-free (details on the design are in [1] and [2]) and transactions use MVCC.

While the details of the implementation of the index data structures are in another paper, this paper does go into details of the MVCC design used and the garbage collection mechanism used to delete old records. Sections 6, 7, and 8 go into details of transactions, logging, and garbage collection. These sections are incredibly well written and do a great job of explaining these complex and core components of the system. The logging and checkpointing system is quite unique and I thought the non-usage (I’m sure there is a better term) of WAL is interesting. Section 8, which goes into details of the garbage collection mechanism used in Hekaton is definitely my favorite section in the paper. I think the GC algorithm is, simply put, beautiful.

Another unique aspect of the system: T-SQL queries and procedures are compiled down into native code to achieve high performance. Section 5 goes into the details of how this is done. What is interesting about this conversion process is that the generated code is one big function with labels and goto statements.

This was a great paper to begin 2016 with.

References

[1] Maged M. Michael. 2002. High performance dynamic lock- free hash tables and list-based sets. In Proceedings of the fourteenth annual ACM symposium on Parallel algorithms and architectures (SPAA ’02): 73-82.

[2] Levandoski, J.J.; Lomet, D.B.; Sengupta, S., “The Bw-Tree: A B-tree for new hardware platforms,” in Data Engineering (ICDE), 2013 IEEE 29th International Conference on , vol., no., pp.302-313, 8-12 April 2013

Research Paper: “AsterixDB: A Scalable, Open Source BDMS”

(AsterixDB was one of the systems mentioned in the “Red Book” that piqued my interest)

AsterixDB: A Scalable, Open Source BDMS gives the reader an overview of the AsterixDB system. AsterixDB is an impressive “big data management system” (BDMS) with several interesting features including a flexible data model, a powerful query language, data ingestion capabilities and distributed query execution. Two features that stood out to me were the ability to describe custom index types (B+-tree, R-tree, etc.) on your data, and the ability to query data that “lives” outside the system.

A majority of the paper is on the data definition and manipulation layer. The authors use an example of a social networking website to illustrate the power of AsterixDB’s data model and query language. Most of this section consists of code snippets (to define, load, and query the data) followed by an explanation of what exactly that snippet of code does, and what happens under the hood when that snippet is run. These code snippets make this section of the paper very easy to read and understand.

The data storage, indexing, and query execution components are described in the System Architecture section of the paper. These subsystems have separate papers ([1] and [2]) devoted to them; in this paper we are just given a brief overview of how they function and what their key features are. One piece of information that stood out to me in this section was the software layer described that grants any index data structure LSM update semantics. I thought this was a very novel idea to help speed up data ingestion and index building, while at the same time having the benefit of diverse index data structures based on the type of data being stored and indexed. The secondary index design is also interesting.

I really enjoyed reading this paper. I’ve added [1] and [2] to my “research papers to read next” list, and hope to get to it very soon.

[1] S. Alsubaiee, A. Behm, V. Borkar, Z. Heilbron, Y.-S. Kim, M. Carey, M. Dressler, and C. Li. Storage Management in AsterixDB. Proc. VLDB Endow., 7(10), June 2014.

[2] V. Borkar, M. Carey, R. Grover, N. Onose, and R. Vernica. Hyracks: A Flexible and Extensible Foundation for Data-intensive Computing. ICDE, 0:1151–1162, 2011.

Red

(The worst part about jet lag is jet lag. The best part about jet lag is that it makes be very productive for some reason. Last year I read a book and a few research papers. This year I finished reading the “Red Book” while not being able to sleep according to the time zone I’m in)

As I’ve mentioned before, databases hold a special place in my heart. I think they’re incredibly interesting pieces of software. State of the art database systems that exist today are result of decades of research and systems engineering. The “Red Book” does a superb job in explaining how we got here, and where we might be going next.

The book is organized into chapters that deal with different components and related areas of database systems. The authors pick a few research papers that are significant in the chapter under discussion and then offer their commentary on them, as well as explain the content of the paper and talk about other related systems/papers/techniques/algorithms. The authors (Peter Bailis, Joseph M. Hellerstein, and Michael Stonebrakerhave a lot of technical expertise in database systems which makes this book an absolute delight to read. I particularly enjoyed the personal anecdotes and commentaries that sprinkled throughout the book. My favorite chapters in the book were the ones on weak isolation and distribution and query optimization.

While reading this book I made note of all research papers that are referenced in this book that I would like to read next. I will be working on that list over the duration of my vacation.

DB

I recently started using Pocket (why had I not used this awesome service before) and discovered a fantastic post on databases under its “Recommended” tab.

Databases are fascinating and complex pieces of software. Ben’s post does an excellent job of explaining database architectures, starting from the pieces that compose the core storage and indexing mechanisms, all the way to horizontally scaling out these systems. Ben does a great job of explaining complex system engineering concepts in a simple fashion, and the beautiful diagrams help a lot.

Highly recommended read for anyone interested in understanding how their favorite database system works!

passwd

Over the weekend I read an interesting paper: How to Memorize a Random 60-Bit String. This was probably the first security paper I’d read in quite some time. I discovered this paper via The Morning Paper, which in my opinion is one of the best resources for people interested in computer science research papers. The title of the paper caught my eye and I decided to read the entire paper (this is typically my strategy with The Morning Paper; I tend to use it for “paper discovery”).

This paper builds on the password generation mechanism introduced by XKCD. It modifies XKCD’s approach to work for 60-bit long passwords and also proposes new schemes for English password generation based on random 60-bit strings. We use a lot of online services in our lives, and creating a secure + easy to remember password (especially if you use the same password for all the services. Please don’t do this.) is essential in order to safeguard ourselves.

While XKCD’s scheme is based on a simple dictionary lookup in order to generate a 4 letter phrase, the methods proposed in the paper are more involved. They all use n-gram language models in order to generate English word passwords that, while random, still seem to have the correct structure of a grammatically correct English sentence fragment or phrase.

I think the most impressive scheme proposed is the one in which a 60-bit string is converted into a poem. Yes, you read that right, a poem. This is done using word information from the CMU pronunciation dictionary + FST + FSA for their accepted poem structure. I thought this was the most interesting part of the paper. It was also the part that took me the longest to read.

The paper ends with a section on experiments showcasing user preference and recall for the difference generation schemes proposed. What is interesting is that even though the poetry scheme has the highest recall percentage it has the second lowest user preference percentage (the XKCD method has the lowest preference).

This paper was a fun, short read. I really enjoyed it!

Teach

For October’s InDay I had the opportunity to teach high school students how to code.  This was the first time I was mentoring/teaching someone who had no prior programming experience.

The goal of the lesson was to help the student build a simple guessing game (pick a random number and have the user try to guess it; let the user know if their guess is greater than or lesser than the random number) in Javascript. On the surface this seems like an easy task; but if you think about it it involves using quite a few programming constructs — variables, loops, conditionals, functions, random number generators, etc.

One method that I found useful while teaching was to introduce the programming concepts with the help of the mathematical concepts on which they are built. Variables and functions in programming languages are (more or less) based on the same constructs in mathematics and it was easy to draw parallels between the two.

To explain how to generate a random number within a range I used the Google Chrome Console to show how function composition in programming works.

Also, analogies help. “Why do we need HTML, CSS, and Javascript?” “Well, HTML elements are the basic building blocks of a webpage. CSS allows you to add color to and position the blocks — it makes the blocks look pretty. Javascript makes the blocks interactive.” Yes, I’m well aware that this analogy is not perfect, but it was the best I could come up with at the time and I think captures the essence of why we need this trio to build websites today.

Overall this was a challenging and rewarding experience. I need to find more volunteering opportunities that are coding related.

Meta

At the beginning of September I accepted a challenge of writing one blog post a day for the rest of the month. With the month coming to an end I thought it would be interesting to see how I did, and also do a quick analysis on my posts.

September has 30 days. I wrote 28 posts. Try as much as you want, 28 does not equal 30. The math simply does not work out. So I did fail in my attempt at doing a post a day. But on the bright side I did get pretty close. And this is also the most I’ve ever posted in a month. So yay.

I took my small sample set (you’ve heard of big data? This is small, artisanal, and organic data) of the blog posts I wrote and ran it through a simple script in order to get some statistics. The input command line argument is a folder containing all the posts, where the title of the file is the title of the post and the content of the file is the content of the post.

Yes, I know the regular expression used to split a post into words is very primitive and doesn’t handle all the cases. It doesn’t really matter for such a simple analysis. This script is probably not the most elegant or efficient. But I’m pretty tired and should really be sleeping instead of writing this post right now.

Onto the statistics.

My shortest posts (in ascending order) were:

  1. M2 with 11 words.
  2. What I’m Currently Listening To: Scale the Summit with 26 words.
  3. What I’m Currently Listening To: Foals with 29 words.

I’m not surprised by #2 and #3; these posts are typically very short. #1 is short because I was exhausted from MHacks.

My longest posts (in descending order) were:

  1. Library with 355 words.
  2. Dog with 320 words.
  3. Feel with 292 words

I’m surprised that my longest posts aren’t that long.

Which brings me to my average post length, which is 138.96 words. A friend of mine said that I should write longer posts. I think I agree.

The total number of words I wrote is 3891. The number of unique words is 1107. In other words (pun intended), 28.4%. I wonder what the average value for this ratio is.

The list of the 100 words I used the most is dominated by commonly used words in the English language. This makes sense because I did not do any pre-processing on the data to remove/ignore them. Other words that I used a lot are:

  1. time. 24 occurrences.
  2. think. 15 occurrences.
  3. Portland. 9 occurrences (I was in Portland for the long weekend in September).
  4. climbing. 7 occurrences (I discovered climbing very recently).
  5. MHacks. 6 occurrences (I was at MHacks in September).

Overall, this blogging challenge turned out to be harder than I expected. But it was also super fun! I will use the momentum built over the course of the month in order to (hopefully) maintain a more frequent posting schedule.

Library

On Tuesday I misread that the Papers We Love meet up that scheduled to take place on 11/19 was actually happening today. I think I messed up because I only looked at the day instead of the date. Silly me.

This mix-up gave me an excuse (like I needed one) to read a research paper on the bus ride back from work. I chose to read “Threads Cannot Be Implemented As a Library” by Hans-J.Boehm.

The main argument laid forth in this paper is that for concurrency to be defined and used correctly in a language it must be part of the language specification and cannot be bolted on as a library. The motivating example used throughout the paper is that of C and the Pthreads library. Pthreads was probably my first introduction to concurrency (it might have been Java threads; I don’t remember) which is one of the reasons why I thought this paper was really interesting. It’s also something that I’ve not really thought of — I know that Pthreads is just a library but for some reason I felt that it was a part of C.

To illustrate his point the author gives three categories of errors that arise when threads are a library. In essence all three errors stem from compiler optimizations. Because the compiler is not aware of threading/concurrency semantics (because it is not a part of the language specification) it makes optimizations that can lead to performance boosts in single threaded programs, but could cause problems in multi-threaded programs. The paper ends with a section on performance showing how expensive concurrency primitives are, and what can be done in C++ to come up with a standard for threading. You can find details about the C++11 memory model here.

This paper is short (10 pages!) and very easy to read. Even though it is about 10 years old the ideas and thoughts presented in it are still valid, especially if you’re coding in C. Or if you plan on inventing a new programming language. I highly recommend reading it.

Paper Review: Paxos Explained from Scratch

“Tutorial Summary: Paxos Explained from Scratch” is an extremely unique and interesting paper. As evident from the title, the paper attempts to explain the Paxos algorithm to the reader. What makes this paper great is that it builds up the Paxos algorithm step-by-step.

The Paxos algorithm is explained in the context of building a replicated state machine. The authors begin with a simple algorithm for consensus. By injecting failures in this simple algorithm we eventually derive the Paxos algorithm in a very natural fashion.

This is the first time I’ve read a bottom-up explanation of Paxos and I thought it was quite easy to understand. Each algorithm they present (building up to the Paxos algorithm) is also accompanied by a pictorial explanation which made concepts even more clear.

Overall, I loved this paper. If you’re looking to refresh your knowledge on the Paxos algorithm I would recommend reading this paper, followed by Paxos Made Simple.