In mabafaba/hypegrammaR: A grammar of hypothesis test driven analysis

knitr::opts_chunk$set(
  collapse = TRUE,
  eval = F,
  comment = "#>"
)

Pretext

The hypegrammaR package implements the IMPACT quantitative data analysis guidelines. While this guide works on its own, all of this will make a lot more sense if you read those first.

What you need:

get all your files in csv format first:

your data as a csv file
your sampling frame(s) as csv file(s) if the analysis needs to be weighted
your kobo questionnaire questions and choices sheets as two csv files; Necessary if you have select_multiple questions, skip logic or want to use the labels.

IMPORTANT NOTE: your data and questionnaire must adhere to the standard xml style output from kobo otherwise this will either not work or produce wrong results.

column headers unchanged

questionnaire names unchanged

xml values (NOT labeled)

select_multiple questions have one column with all responses concatenated (separated by blank space " "), and one column for each response named [question name].[choice name]

Your sampling frame must be in the correct format:

must have one row per stratum; one column for population numbers; one column for strata names

The values in the column with the strata names must appear exactly identical in a column in the data

Preparation

Install the hypegrammaR package

This line you only have to run once when using hypegrammaR for the first time, or to update to a new version.

devtools::install_github("ellieallien/hypegrammaR",build_opts = c(), ref = "master")

(the build_opts = c() makes sure the package includes all extra help pages & documentation) For this step only, you need a more or less stable internet connection.

Load the hypegrammaR package

library(hypegrammaR)
library(koboquest)

Load your files

The data

load_data takes only one argument file, the path to the csv file.

assessment_data<-load_data(file = "../data/testdata.csv")

Conditions:

it must adhere to standard kobo xml format
it must not contain labeled values
it must have a single row for column headers (unchanged as they come out of kobo)
it may contain additional columns that were not in the original questionnaire. It is good practice to additional new variables as additional rows to the questionnaire, specifying variable type, choices etc.

Load the sampling frame

This is only necessary if the analysis needs to be weighted.

The sampling frame should be a csv file with one column with strata names, one column with population numbers. load_samplingframe takes only one argument file, the path to the csv file.

sampling_frame<-load_samplingframe(file = "../data/test_samplingframe.csv")

Now turn your sampling frame into a weighting function with map_to_weighting. We see that the relevant columns in the sampling frame are called "strata names" and "population". The column in the data correspoinding to the strata names is called "stratification" in this data set. We need to supply these column names as arguments to map_to_weighing:

myweighter<-map_to_weighting(sampling.frame = sampling_frame,
                             data.stratum.column = "stratification",
                             sampling.frame.population.column = "population",
                             sampling.frame.stratum.column = "strata.names", 
                             data = assessment_data)

The Questionnaire

Finally the questionnaire, which depends on the question and the choices sheet as a csv. hypegrammaR can live without this, but a questionnaire is necessary for correct analysis of select_multiple questions.

The parameters are:

data: the object that stores the data loaded above
questions.file: the path to the questions sheet as a csv file
choices.file the path to the choices sheet as a csv file
choices.label.column.to.use: the exact name of the column containing the labels that should be used. You can add an extra column with custom labels to your choices sheet if you don't want to use the choice labels from the original questionnaire

questionnaire<- koboquest::load_questionnaire(data = assessment_data,
                                  questions = "../data/test_questionnaire_questions.csv",
                                  choices = "../data/test_questionnaire_choices.csv",
                                  choices.label.column.to.use = "label::English"
                                             )

Conditions:

Both the choices and questions csv files should be exact copies of the respective sheets in the kobo xml form (except for the additions mentioned above.)

Analysis

Identify your analysis parameters

what are the column headers of your dependent and independent variables, and what are their data types?
- this Example: "sourceexpenses" (categorical) and "informalsettlement" (categorical)
what type of hypothesis do you have?
- this Example: the difference between groups

map to the analysis case

For this example, our hypothesis is that idp or refugee households received different assistance compared host community hh.

my_case<-map_to_case(hypothesis.type = "group_difference",
                  dependent.var.type = "numerical",
                  independent.var.type =  "categorical")

You can find out what exactly you can/should enter for these parameters by running ?map_to_case, which will open the help page for the function (this works for any function)

Run the analysis

Now you use the inputs loaded above to an analysis result:

expenditure_urban_rural<-map_to_result(data = assessment_data,
                      dependent.var = "expendituretotal",
                      independent.var = 'urbanrural',
                      case = my_case,
                      weighting = myweighter,
                      questionnaire = questionnaire)

Show Results

First, we add labels to the result (assuming that you've loaded a questionnaire):

expenditure_urban_rural<-map_to_labeled(result = expenditure_urban_rural,
                                                questionnaire = questionnaire)

Finally, we can get visualisations, tables etc.:

chart<- map_to_visualisation(expenditure_urban_rural)
table<-map_to_table(expenditure_urban_rural)
chart
table

Save/export the results

To save/export any results, you can use the generic map_to_file function. For example:

# map_to_file(results$summary.statistics,"summary_stat.csv")
# map_to_file(myvisualisation,"barchart.jpg",width="5",height="5")

You will find the files in your current working directory (which you can find out with getwd())

Stratified Samples

For the weighting, we'll use IMPACT's surveyweights package. It was installed when you installed hypegrammaR. Load it:

# library(surveyweights)

For stratified samples, we need to provide a sampling frame.

# mysamplingframe<-read.csv("../tests/testthat/test_samplingframe.csv")

The sampling frame must have exactly one row per stratum; one column for the strata names, one column for the population numbers. Our example data frame looks like this:

# head(mysamplingframe)

The names must match exactly the values in a column of the data; in this case it is mydata$stratification.

# head(mydata$stratification)

Now we can create a "weighter" object:

# myweighter<-weighting_fun_from_samplingframe(sampling.frame = mysamplingframe,
#                                              data.stratum.column = "stratification",
#                                              sampling.frame.population.column = "population",
#                                              sampling.frame.stratum.column = "strata.names")

Now we can use analyse_indicator just like before, but pass it the weighter we just created, so the weighting will be applied (pay attention to the last argument):

# result<-analyse_indicator(
#                   data = mydata,
#                   dependent.var = "nutrition_need",
#                   independent.var = "region",
#                   hypothesis.type = "group_difference",
#                   independent.var.type = "categorical",
#                   dependent.var.type="numerical",
#                   weighting = myweighter)


# result$summary.statistic

Cluster Samples

without extra weighting

If the clusters don't need to be weighted (in addition to the strata), all you need is to tell analyse_indicator which data variable identifies the cluster:

# result<-analyse_indicator(
#                   data = mydata,
#                   dependent.var = "nutrition_need",
#                   independent.var = "region",
#                   hypothesis.type = "group_difference",
#                   independent.var.type = "categorical",
#                   dependent.var.type="numerical",
#                   weighting = myweighter,
#                   cluster.variable.name = "village")

With extra weighting

If the clustering comes with it's own weights (if records in different clusters within the same stratum had different probabilities to be selected probabilities), you need to load the sampling frame in the same way you did for the stratification, then combine the two weighting functions:

# samplingframe2<-read.csv("../tests/testthat/test_samplingframe2.csv")
# 
# myweighting_cluster<-weighting_fun_from_samplingframe(samplingframe2,
#                                                     data.stratum.column = "village",
#                                                     sampling.frame.population.column = "populations",
#                                                     sampling.frame.stratum.column = "village.name")
# 

# combined_weighting<-combine_weighting_functions(myweighing,myweighting_cluster)

# result<-analyse_indicator(
#                   data = mydata,
#                   dependent.var = "nutrition_need",
#                   independent.var = "region",
#                   hypothesis.type = "group_difference",
#                   independent.var.type = "categorical",
#                   dependent.var.type="numerical",
#                   weighting = combined_weighting,
#                   cluster.variable.name = "village")

Using the Questionnaire

If we further load the questionnaire, we can do some extra cool stuff:

Better analyse select_multiple questions
Automatically identify data types
Put proper labels on plots

Loading the questionnaire

require("koboquest")
# questionnaire<- load_questionnaire(mydata,
#                               questions.file = "../tests/testthat/kobo questions.csv",
#                               choices.file = "../tests/testthat/kobo choices.csv")



# result<-analyse_indicator(mydata,
#                   dependent.var = "accesstomarket",
#                   independent.var = "region",
#                   dependent.var.type = "categorical",
#                   independent.var.type = "categorical",
#                   hypothesis.type = "group_difference",
#                   weighting=myweighter)


# vis<-map_to_visualisation(result)

mabafaba/hypegrammaR documentation built on Oct. 2, 2019, 11:33 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

mabafaba/hypegrammaR
A grammar of hypothesis test driven analysis

In mabafaba/hypegrammaR: A grammar of hypothesis test driven analysis

Pretext

What you need:

Preparation

Install the hypegrammaR package

Load the hypegrammaR package

Load your files

The data

Load the sampling frame

The Questionnaire

Analysis

Identify your analysis parameters

map to the analysis case

Run the analysis

Show Results

Save/export the results

Stratified Samples

Cluster Samples

without extra weighting

With extra weighting

Using the Questionnaire

Loading the questionnaire

R Package Documentation

Browse R Packages

We want your feedback!

mabafaba/hypegrammaR A grammar of hypothesis test driven analysis

In mabafaba/hypegrammaR: A grammar of hypothesis test driven analysis

Pretext

What you need:

Preparation

Install the hypegrammaR package

Load the hypegrammaR package

Load your files

The data

Load the sampling frame

The Questionnaire

Analysis

Identify your analysis parameters

map to the analysis case

Run the analysis

Show Results

Save/export the results

Stratified Samples

Cluster Samples

without extra weighting

With extra weighting

Using the Questionnaire

Loading the questionnaire

R Package Documentation

Browse R Packages

We want your feedback!

mabafaba/hypegrammaR
A grammar of hypothesis test driven analysis