How to Estimate the Number of Heroku web Dynos

Last week I had a class on Introduction to Operations Management at Cousera.org ( https://class.coursera.org/course/operations ). Being an engineer with operations research background, that would sound silly to hope that something there would help me being a better devops, but when I least expected, there it was sound and clear, the waiting models.

I vaguely remembered about Markov chains, M/M/1 queues, that once in my life I used to justify a SLA for a quote.

The problem with the graphs at NewRelic is that it´s not clear how to calculate and justify the number of dynos on a Heroku app using those figures.

It turns out it´s simple to calculate the number of dynos simply by establishing a baseline to the maximum time you can afford to make your requests wait!

First and foremost the number of workers should be the minimum that attends this simple rule:

u = FLOW RATE / CAPACITY < 100%

where FLOW RATE can be inferred in a NewRelic by requests per minute
and CAPACITY is the number of workers ( w ) multiplied by the average time to serve a request (p - also inferred in NewRelic )

u = target RPM / ( w * avg response time)

( Pure common sense but many times I was getting it wrong, having just newrelic´s Apdex and response times as parameter for setting the number of workers ) .

Well, that being said, let´s minimize the Time on queue Tq with the formula below:

Tq = p/m * u / (1-u) * (CVa2 + CVp2)/2

Lets look at the first part ( p/m ), It´s just the process time divided by the number of workers, which divide the load.

The second parameter tends to infinit with values of utilization close to 100%, which sounds intuitive.

And the third parameter is the normalized variance of the process time and the interval between arrivals ( 1/throughput ). For the sake of simplicity, lets assume this parcel is equal to 1, but you can get approximately this figure on NewRelic or even calculate like I did by ETL´ ing the webserver access log.

Put this formula on Excel and what you get is the Time on queue depending on the number of workers you put on the formula.

I found this very enlightening and a very rational way to trigger more dynos at my application and not being blamed for being too conservative.

Maybe this can turn into an autoscaler gem sometime soon.