Research Paper: “AsterixDB: A Scalable, Open Source BDMS”

(AsterixDB was one of the systems mentioned in the “Red Book” that piqued my interest)

AsterixDB: A Scalable, Open Source BDMS gives the reader an overview of the AsterixDB system. AsterixDB is an impressive “big data management system” (BDMS) with several interesting features including a flexible data model, a powerful query language, data ingestion capabilities and distributed query execution. Two features that stood out to me were the ability to describe custom index types (B+-tree, R-tree, etc.) on your data, and the ability to query data that “lives” outside the system.

A majority of the paper is on the data definition and manipulation layer. The authors use an example of a social networking website to illustrate the power of AsterixDB’s data model and query language. Most of this section consists of code snippets (to define, load, and query the data) followed by an explanation of what exactly that snippet of code does, and what happens under the hood when that snippet is run. These code snippets make this section of the paper very easy to read and understand.

The data storage, indexing, and query execution components are described in the System Architecture section of the paper. These subsystems have separate papers ([1] and [2]) devoted to them; in this paper we are just given a brief overview of how they function and what their key features are. One piece of information that stood out to me in this section was the software layer described that grants any index data structure LSM update semantics. I thought this was a very novel idea to help speed up data ingestion and index building, while at the same time having the benefit of diverse index data structures based on the type of data being stored and indexed. The secondary index design is also interesting.

I really enjoyed reading this paper. I’ve added [1] and [2] to my “research papers to read next” list, and hope to get to it very soon.

[1] S. Alsubaiee, A. Behm, V. Borkar, Z. Heilbron, Y.-S. Kim, M. Carey, M. Dressler, and C. Li. Storage Management in AsterixDB. Proc. VLDB Endow., 7(10), June 2014.

[2] V. Borkar, M. Carey, R. Grover, N. Onose, and R. Vernica. Hyracks: A Flexible and Extensible Foundation for Data-intensive Computing. ICDE, 0:1151–1162, 2011.

2 thoughts on “Research Paper: “AsterixDB: A Scalable, Open Source BDMS”

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s