In vikjam/bcstatsR: Comparing "back checks" in R (a clone of Stata's bcstats)

Let's apply bcstats in a few examples.

Toy example

First, consider the following minimal working example. bcstats comes with two example two data sets. Load the library to get started.

library(bcstatsR)

And then load the two datasets that come bundled with the library.

data(survey)
data(bc)

Let's take a look at the survey data.

print(survey)

knitr::kable(survey)

Now, take a look at the back check data (i.e., the follow up where highly trained surveyors interview the same households).

print(bc)

knitr::kable(bc)

In this example, gender, gameresult and itemssold are the variables collected in both the survey and the back check. Note that id identifies the respondent in both the survey and the back check. In the survey, enum and enumteam tells us the surveyor and the team of the surveyor. We'll want to know whether or not these surveyors and teams collected the data correctly in the survey. Similarly, in the back check, we'll want to summarize the data by back checker to see if we notice unusual patterns.

Now, let's run the back check!

result <- bcstats(surveydata  = survey,
                  bcdata      = bc,
                  id          = "id",
                  t1vars      = "gender",
                  t2vars      = "gameresult",
                  t3vars      = "itemssold",
                  enumerator  = "enum",
                  enumteam    = "enumteam",
                  backchecker = "bcer")

And auto-magically, you've created a bunch of results stored in result. Let's take a look at back check, which has been stored in result$backcheck.

print(result$backcheck)

knitr::kable(result$backcheck)

Each row contains the difference between the survey and the back check by each household and variable. Cases where nothing changed have not been included in this data.frame. Now let's take a look at the error rates for Type 1 variables by each surveyor (enumerator).

print(result[["enum1"]]$summary)

knitr::kable(result[["enum1"]]$summary)

We can also take at the error rate for each Type 1 variable by enumerator.

print(result[["enum1"]]$each)

knitr::kable(result[["enum1"]]$each)

And we can do the same thing for Type 2 variables.

print(result[["enum2"]]$summary)
print(result[["enum2"]]$each)

knitr::kable(result[["enum2"]]$summary)
knitr::kable(result[["enum2"]]$each)

Now let's redo the back check where this time we do a t-test for the differences between the survey data and the back check.

result <- bcstats(surveydata  = survey,
                  bcdata      = bc,
                  id          = "id",
                  t1vars      = "gender",
                  t2vars      = "gameresult",
                  t3vars      = "itemssold",
                  enumerator  = "enum",
                  enumteam    = "enumteam",
                  backchecker = "bcer",
                  ttest       = "itemssold")

You can find the results for the t-test as an element of the results list.

print(result[["ttest"]]$itemssold)

We could have choosen to not code some changes as errors as follows,

result <- bcstats(surveydata  = survey,
                  bcdata      = bc,
                  id          = "id",
                  t1vars      = "gender",
                  t2vars      = "gameresult",
                  t3vars      = "itemssold",
                  enumerator  = "enum",
                  enumteam    = "enumteam",
                  backchecker = "bcer",
                  nodiff      = list(itemssold = c(0)))

or specify an acceptable range,

result <- bcstats(surveydata  = survey,
                  bcdata      = bc,
                  id          = "id",
                  t1vars      = "gender",
                  t2vars      = "gameresult",
                  t3vars      = "itemssold",
                  enumerator  = "enum",
                  enumteam    = "enumteam",
                  backchecker = "bcer",
                  okrange     = list(itemssold = c(0, 5)))

or exclude them all together.

result <- bcstats(surveydata  = survey,
                  bcdata      = bc,
                  id          = "id",
                  t1vars      = "gender",
                  t2vars      = "gameresult",
                  t3vars      = "itemssold",
                  enumerator  = "enum",
                  enumteam    = "enumteam",
                  backchecker = "bcer",
                  exclude     = list(itemssold = c(0)))

Multiple variables

Of course, you'll want to check multiple variables within any given type. You can just pass those as variable names as a list. For example, if you want to run the back check with both gender and gameresult as Type 1 variables, you could do the following:

result.mv <- bcstats(surveydata  = survey,
                     bcdata      = bc,
                     id          = "id",
                     t1vars      = c("gender", "gameresult"),
                     t3vars      = "itemssold",
                     enumerator  = "enum",
                     enumteam    = "enumteam",
                     backchecker = "bcer",
                     exclude     = list(itemssold = c(0)))