(second paper in my quest to distract myself)
f4: Facebook’s Warm BLOB Storage System introduces the reader to f4, a storage system designed and used at Facebook to store “warm” binary large objects (aka BLOBs). The term “warm” is used to denote the fact that these pieces of data are not as frequently accessed as “hot” BLOBs (which are stored in Haystack). The main motivation behind the design of f4 was the desire to lower the replication factor for warm BLOBs, while still maintaining the same fault tolerance guarantees (node, rack, and datacenter failure) that hot BLOBs have.
The first half of the paper dives into warm BLOBs and their characteristics (section 3) and also gives an overview on how Haystack works (section 4).
Section 5 dives into the details of f4. It explains the overall architecture of the system, how it leverages Reed-Solomon coding to reduce storage overhead (compared to raw replication), how the replication factor for BLOBs is 2.1 (compared to 3.6 in Haystack), how fault tolerance works, etc. The architecture section is very well written and does a good job of explaining the different types of nodes that comprise a f4 cell. My favorite section in the paper is the one that talks about fault tolerance (section 5.5); the “Quadruple Failure Example” in this section is extremely interesting and does a good job of showing how the system deals with failures at various levels. Another part of the paper that I really liked was the section on “Software/Hardware Co-Design” in section 5.6.
Overall this paper was fun to read and very interesting. It had been on my “To Read” list for quite some time now and I’m glad I finally got to it.