Running Fabric tasks in parallel based on roles

We’ve been using Fabric to set up and build Gelato on AWS. Each time I use it I’m left with this sense of awe at how amazing it is. Going from having to manually SSH into each machine to do anything to have Fabric build your code on 15 machines in parallel is indescribable.

One thing that we were having trouble with was having Fabric run a task on specific host roles in parallel. To run tasks in parallel you use the @parallel decorator, while to run tasks on hosts by roles you use the @roles decorator. If you want to run tasks in parallel on specific hosts you have to be careful of the order in which you apply these decorators. Here is what worked for us:


env.roledefs = {
"service_A": ["hostA1", "hostA2", …],
"service_B": ["hostB1", "hostB2", …],
"service_C": ["hostC1", "hostC2", …],
}
@task
@parallel
@roles("service_A", "service_B", "service_C")
def build():

view raw

fabfile.py

hosted with ❤ by GitHub

P.S. make sure you set the correct Bubble Size if you have a large number of hosts!

Gelato Tech Stack

For our Advanced Distributed Systems (CS 525) final project Onur and I are working on a system we’ve named Gelato. I will have more details about it in a month when we (hopefully) open source our code. Our tech stack for Gelato looks something like this:

We were deciding between Cassandra and HBase and decided to use HBase because HBase has a native Java API and is pretty easy to use on AWS thanks to AWS EMR.

The languages we are using are:

  • Java for the core Gelato system
  • Python to gather performance metrics

Gelato is pretty the most complex system I’ve built during college and I’m really excited to see how it finally turns out.