Cloud computing has empowered developers to create very complex inter-connected systems with unique network topologies. Software engineers cope by automating configuration, standarizing operating procedures, validating system changes, soft-deployments ("canaries"), easy roll-back to known software versions, and fire drills (read more).
Absent from the discussion appears to be statistical techniques for understanding and improving system reliability. This software library seeks to create a bridge between statistical and engineering techniques.
Consider the simplest setup, a web server and a database,
[ Web server @ 0.90 reliability ] ------> [ Database @ 0.90 reliability ]
the reliability of this system can be deconstructed into three terms,
= P(web server failure) + P(database failure) - [ P(web server failure) * P(database failure) ]
The likelihood of system failure for a two component series is the addition of each component's likelihood of failure e.g. P(web server failure)
minus the likelihood that both components fail. The intuition is that it only takes one component failure to cause system failure, so we must subtract the unnecessary probability that both fail.
Using library(reliability)
, here's the calculation step-by-step:
library(magrittr) library(reliability) # just one node node(name = "web server", reliability = 0.90) # now showing probability of failure node(name = "database", reliability = 0.90) %>% prob_of_failure()
Our two-component system,
node(name = "web server", reliability = 0.90) %>% node(name = "database", 0.90)
Putting it together,
node(name = "web server", reliability = 0.90) %>% node(name = "database", 0.90) %>% prob_of_failure()
Similar to Legos, reliability
lets you build complex systems out of simple components (node
). For example, we can add an arbitrary number of components to a series like,
node(name = "load balancer", reliability = 0.999) %>% node(name = "web server", 0.99) %>% node(name = "caching layer", 0.95) %>% node(name = "database", 0.9999) %>% prob_of_failure() %>% scales::percent()
Meeker, William Q., and Luis A. Escobar. Statistical Methods for Reliability Data. New York: Wiley, 1998. Print.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.