(my thoughts and summary on the first paper I read this month as part of my endeavor to read ten research papers in June)
Paper: Early Detection of Configuration Errors to Reduce Failure Damage
This paper introduces readers to PCHECK, a tool the researchers designed to check configuration values used by programs. The goal of PCHECK is to check configuration values at system initialization time, and fail fast if any errors in these configuration values are detected. PCHECK accomplishes this by emulating application code that uses configuration values, and also ensures that this emulation does not cause any internal or external side effects.
The authors motivate the design of PCHECK by observing that most software systems do not check configuration value at system initialization time. Configuration values are typically checked right before they are used. This is a problem because an incorrect configuration value might cause the entire software system to fail, long after the system has started. This is especially problematic when a configuration error causes a fault tolerance or system resiliency component to fail; such components are typically instantiated only under exceptional circumstances, hence the configuration values used by these components might not be checked for correctness at overall system start up time.
PCHECK works by generating checkers that are run at system initialization time. These checkers emulate configuration value usage in the actual system code. PCHECK captures all the context (global variables, local dependent variables, etc.) required to emulate these usages. PCHECK checkers are side effect free and work by creating copies of all the system variables so as to not modify any state. PCHECK handles system calls, libc
function calls, core Java library function calls, etc. by providing what they term a “check utility”, which is essentially a mock function that doesn’t mutate external state but validates the function call arguments. Other external dependencies that might arise while using a configuration value, such as reading/writing to an external file, sending/receiving data from the network, etc. are dealt with in a similar fashion; external files are checked for metadata (while the paper does not explicitly say what these metadata checks are, I’m assuming they are something along the lines of “Does this file exist?”, “Do I have the permissions to read/write to this file?”, etc.) and network addresses are checked for reachability. Configuration issues are detected by the PCHECK checkers by checking for the presence of Exception
s, errno
values, and signs by the program (for example the program terminating with an exit(1)
) that something went wrong while running the emulated configuration value usage code.