In Zanidean/sRa: Functions for improving the life of a research analyst.

Introduction

XMR control charts are useful when determining if there are significant trends in data. XMR charts have two key assumptions: one is that the measurements of value happen over time, and the other is that each measurement of time has exactly one measurement of value.

Take careful thought about what you are trying to measure with XMR. Proportions work best, headcount is okay, costs over time aren't great.

Arguments

The arguments for xmR() are:

df: The dataframe containing the time-series data.
measure: The column containing the measure. This must be in a numeric format.
interval: The interval you'd like to use to calculate the averages. Defaults to 5.
recalc: Logical if you'd like it to recalculate bounds. Defaults to False for safety.
reuse: Logical: Should points be re-used in calculations? Defaults to False.
longrun: Vector of 2 to determine rules for long run. First point is the 'n' of points used to recalculate with, and the second is to determine how many consecutive points are needed to define a longrun. Default is c(5,8) which uses the first 5 points of a run of 8 to recalculate the bounds.
shortrun: Vector of 2 to determine rules for a short run. The first point is the minimum number of points within the set to qualify a shortrun, and the second is the length of a possible set. Default is c(3,4) which states that 3 of 4 consecutive points need to pass the test to be used in a calculation.

The data required for XMR charts take a specific format, with at least two columns of data - one for the time variable and another for the measurement. Like so:

library(sRa)
library(tidyverse)
set.seed(1)
Measure <- round(runif(10, min = 0.50, max = 0.75)*100, 0)
Measure <- c(Measure, round(runif(8, min = 0.75, max = 1)*100, 0))
Time <- c(2000:2017) 
example_data <- data.frame(Time, Measure)
knitr::kable(example_data, format = "markdown", align = 'c')

The function use on this set of data would be written like this if we wanted the boundaries to recalculate.

xmr_data <- xmR(df = example_data, measure = "Measure", recalc = T)

The measure column must be written within quotes.

Output data looks like this:

xmr_data <- xmR(example_data, "Measure", 
                recalc = T) %>% 
  select(-Order)
knitr::kable(xmr_data, format = "markdown", align = 'c')

The only mandatory arguments are 'df', because the function needs to operate on a dataframe, and 'measure' because the function needs to be told which column contains the measurements. Everything else has been set to what I believe is a safe and sensible default, with no single default overriding any other.

In our shop, we typically run the following:

xmr_data <- xmR(df = example_data, measure = "Measure", recalc = T)

Which uses the recalculation function, along with the default shortrun and longrun definitions. Feel free to play around with your own definitions of what a shortrun or longrun is. The differences between rules is slight, but each user will have different needs. It is important to use consistent definitions of what a long/short run are however, it wouldn't be entirely appropriate in one report to use one set of definitions for one dataset, and another set for a different dataset.

Charts

That data calculation is handy, the output can be saved and used in other applications. But what about visualization within R?

There is a function called xmR_chart() which takes that output, and generates a ggplot graphic so the user can identify trends in the data. This works well for reporting, but it also works great for quick diagnostics of your data.

The arguments for xmR_chart() are as follows.

dataframe: Output from xmR()
time: The column containing the time variable for the x-axis.
measure: The column containing the measure for the y-axis.
facetvar: Split the chart by a facet variable. Explained in the next section.

xmR_chart(dataframe = xmr_data, time = "Time", measure = "Measure")

Tidyverse

Simple datasets like those illustrated above are common, but more common are large datasets with multiple factors. Consider the following data. How would the xmR() function benefit the user in this case?

library(sRa)
library(tidyverse)
`Year` <- seq(2004, 2017, 1)
Variable <- "A"
FDA <- data.frame(`Year`, Variable, check.names = F)
Variable <- "B"
FDB <- data.frame(`Year`, Variable, check.names = F)
FD <- rbind(FDA, FDB)

FD$Measure <- runif(nrow(FD))
knitr::kable(FD, format = "markdown", align = 'c')

The answer is by leveraging other R packages, namely the tidyverse.

You can install and load the tidyverse by the following.

install.packages("tidyverse")
library(tidyverse)

What this enables is a number of verb-type functions for tidying and wrangling data. This is how to use them alongside the xmR() function.

Grouping and Faceting

Take our fake data FD and here is how to apply the xmR() function to certain groups within that data.

FD <- FD %>% group_by(Variable) %>% do(xmR(., "Measure", recalc = T))

To obtain the following:

knitr::kable(FD, format = "markdown", align = 'c')

And as you may be able to see, the XMR data calculated on Measure AND by Variable, in one function instead of having to manually split the data. Now, if we tried to plot this data we would do the following.

xmR_chart(dataframe = FD, 
          time = "Year", 
          measure = "Measure", 
          facetvar = "Variable") + 
  scale_x_discrete(breaks = scales::pretty_breaks(5))

Alternatively, you could use the built-in ggplot2 function for more complex faceting.

xmR_chart(dataframe = FD, 
          time = "Year", 
          measure = "Measure") + 
  facet_wrap(~Variable) + 
  scale_x_discrete(breaks = scales::pretty_breaks(5))