See

(my thoughts and summary on the second paper I read this month as part of my endeavor to read ten research papers in June)

Paper: MemC3: Compact and Concurrent MemCache with Dumber Caching and Smarter Hashing

This research paper introduces readers to MemC3, which stands for Memcached with CLOCK and Concurrent Cuckoo Hashing. MemC3 is Memcached with modifications to make it faster and use less space. The researchers achieve the speed boost by allowing multiple reads and a single write to happen concurrently using their own concurrent implementation of Cuckoo Hashing. The space savings are gained by using the CLOCK algorithm to implement LRU cache eviction.

The authors begin by examining the existing use cases and design choices of Memcached. They notice that most Memcached use cases are read heavy, and that Memcached either (a) uses a single thread, or (b) (in newer versions) uses a single global lock. These two observations highlight areas where Memcached’s performance can be improved.

To improve the read and write performance and to enable concurrent operations the researchers replace Memcached hash table + linked list based cache with a 4-way set associate hash table that uses Cuckoo Hashing. I hadn’t heard of Cuckoo Hashing before reading this paper, and it is quite an interesting concept. To allow multiple readers and one writer the authors develop their own concurrent Cuckoo Hashing algorithm that combines lock striping and optimistic locking in a simple and easy to understand algorithm. This section (section 3.2) was the part of the paper that I enjoyed the most.

To save space and to implement an LRU cache eviction policy the authors use the CLOCK algorithm. The paper also describes how their cache eviction algorithm works safely with their concurrent cuckoo hashing algorithm.

The paper ends with an evaluation section that compares MemC3 and Memcached in a variety of configurations and workloads and shows that MemC3 almost always outperforms Memcached.

Injury

Lifting/workout injuries this year so far (listed in descending order of severity):

  1. Pulled muscle in my lower left back.
  2. Micro tear in my tricep in my right arm.
  3. Pain in the region of my right knee.
  4. Sprained finger.
  5. Bruise on nose. This was a result of doing what I call a “jump clap push up”: it’s a clap push up in which you also lift your entire body up while you’re clapping, so your feet are lifted off the ground when you clap. I can typically do these without any problem, but because of injury #2 in this list I fell and my nose hit the ground. Hard. This happened before I knew that I had torn my tricep; I was trying to see if my arm had healed. Turns out it hadn’t.

Breathe

The Contortionist released the first single “Reimagined” from their new album Clairvoyant a few days ago. The song is beautiful and I haven’t been able to stop listening to it since it came out. This single carries over the style they developed on 2014’s Language, which is one of my favorite albums. I can’t wait for the auditory bliss Clairvoyant will (hopefully) bring.

Detective

(my thoughts and summary on the first paper I read this month as part of my endeavor to read ten research papers in June)

Paper: Early Detection of Configuration Errors to Reduce Failure Damage

This paper introduces readers to PCHECK, a tool the researchers designed to check configuration values used by programs. The goal of PCHECK is to check configuration values at system initialization time, and fail fast if any errors in these configuration values are detected. PCHECK accomplishes this by emulating application code that uses configuration values, and also ensures that this emulation does not cause any internal or external side effects.

The authors motivate the design of PCHECK by observing that most software systems do not check configuration value at system initialization time. Configuration values are typically checked right before they are used. This is a problem because an incorrect configuration value might cause the entire software system to fail, long after the system has started. This is especially problematic when a configuration error causes a fault tolerance or system resiliency component to fail; such components are typically instantiated only under exceptional circumstances, hence the configuration values used by these components might not be checked for correctness at overall system start up time.

PCHECK works by generating checkers that are run at system initialization time. These checkers emulate configuration value usage in the actual system code. PCHECK captures all the context (global variables, local dependent variables, etc.) required to emulate these usages. PCHECK checkers are side effect free and work by creating copies of all the system variables so as to not modify any state. PCHECK handles system calls, libc function calls, core Java library function calls, etc. by providing what they term a “check utility”, which is essentially a mock function that doesn’t mutate external state but validates the function call arguments. Other external dependencies that might arise while using a configuration value, such as reading/writing to an external file, sending/receiving data from the network, etc. are dealt with in a similar fashion; external files are checked for metadata (while the paper does not explicitly say what these metadata checks are, I’m assuming they are something along the lines of “Does this file exist?”, “Do I have the permissions to read/write to this file?”, etc.) and network addresses are checked for reachability. Configuration issues are detected by the PCHECK checkers by checking for the presence of Exceptions, errno values, and signs by the program (for example the program terminating with an exit(1)) that something went wrong while running the emulated configuration value usage code.

Ten

I realized that I’ve been (for various reasons) doing a terrible job at reading research papers for the past 2-3 months. In order to fix that and make up for lost time I’m setting an ambitious goal for June 2017 – I’ll read (and maybe write about) about ten research papers over the course of the month. Here are the papers I will be reading (I found a majority of the papers by going through my backlog of unread The Morning Paper posts and picking papers that intrigued me):

  1. WiscKey: Separating Keys from Values in SSD-conscious Storage
  2. Polaris: Faster Page Loads Using Fine-grained Dependency Tracking
  3. Efficient Memory Disaggregation with Infiniswap
  4. CORFU: A distributed shared log
  5. vCorfu: A Cloud-Scale Object Store on a Shared Log
  6. Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions
  7. Early Detection of Configuration Errors to Reduce Failure Damage
  8. MemC3: Compact and Concurrent MemCache with Dumber Caching and Smarter Hashing
  9. Replex: A Scalable, Highly Available Multi-Index Data Store
  10. Hints for Computer System Design