knitr::opts_chunk$set(
  collapse = TRUE,
  eval = F,
  comment = "#>"
)

Pretext

The hypegrammaR package implements the IMPACT quantitative data analysis guidelines. While this guide works on its own, all of this will make a lot more sense if you read those first.

What you need:

get all your files in csv format first:

IMPORTANT NOTE: your data and questionnaire must adhere to the standard xml style output from kobo otherwise this will either not work or produce wrong results.

  • column headers unchanged
  • questionnaire names unchanged
  • xml values (NOT labeled)
  • select_multiple questions have one column with all responses concatenated (separated by blank space " "), and one column for each response named [question name].[choice name]

Your sampling frame must be in the correct format:

  • must have one row per stratum; one column for population numbers; one column for strata names
  • The values in the column with the strata names must appear exactly identical in a column in the data

Preparation

Install the hypegrammaR package

This line you only have to run once when using hypegrammaR for the first time, or to update to a new version.

devtools::install_github("ellieallien/hypegrammaR",build_opts = c(), ref = "master")

(the build_opts = c() makes sure the package includes all extra help pages & documentation) For this step only, you need a more or less stable internet connection.

Load the hypegrammaR package

library(hypegrammaR)
library(koboquest)

Load your files

The data

load_data takes only one argument file, the path to the csv file.

assessment_data<-load_data(file = "../data/testdata.csv")

Conditions:

Load the sampling frame

This is only necessary if the analysis needs to be weighted.

The sampling frame should be a csv file with one column with strata names, one column with population numbers. load_samplingframe takes only one argument file, the path to the csv file.

sampling_frame<-load_samplingframe(file = "../data/test_samplingframe.csv")

Now turn your sampling frame into a weighting function with map_to_weighting. We see that the relevant columns in the sampling frame are called "strata names" and "population". The column in the data correspoinding to the strata names is called "stratification" in this data set. We need to supply these column names as arguments to map_to_weighing:

myweighter<-map_to_weighting(sampling.frame = sampling_frame,
                             data.stratum.column = "stratification",
                             sampling.frame.population.column = "population",
                             sampling.frame.stratum.column = "strata.names", 
                             data = assessment_data)
The Questionnaire

Finally the questionnaire, which depends on the question and the choices sheet as a csv. hypegrammaR can live without this, but a questionnaire is necessary for correct analysis of select_multiple questions.

The parameters are:

questionnaire<- koboquest::load_questionnaire(data = assessment_data,
                                  questions = "../data/test_questionnaire_questions.csv",
                                  choices = "../data/test_questionnaire_choices.csv",
                                  choices.label.column.to.use = "label::English"
                                             )

Conditions:

Analysis

Identify your analysis parameters

map to the analysis case

For this example, our hypothesis is that idp or refugee households received different assistance compared host community hh.

my_case<-map_to_case(hypothesis.type = "group_difference",
                  dependent.var.type = "numerical",
                  independent.var.type =  "categorical")

You can find out what exactly you can/should enter for these parameters by running ?map_to_case, which will open the help page for the function (this works for any function)

Run the analysis

Now you use the inputs loaded above to an analysis result:

expenditure_urban_rural<-map_to_result(data = assessment_data,
                      dependent.var = "expendituretotal",
                      independent.var = 'urbanrural',
                      case = my_case,
                      weighting = myweighter,
                      questionnaire = questionnaire)

Show Results

First, we add labels to the result (assuming that you've loaded a questionnaire):

expenditure_urban_rural<-map_to_labeled(result = expenditure_urban_rural,
                                                questionnaire = questionnaire)

Finally, we can get visualisations, tables etc.:

chart<- map_to_visualisation(expenditure_urban_rural)
table<-map_to_table(expenditure_urban_rural)
chart
table

Save/export the results

To save/export any results, you can use the generic map_to_file function. For example:

# map_to_file(results$summary.statistics,"summary_stat.csv")
# map_to_file(myvisualisation,"barchart.jpg",width="5",height="5")

You will find the files in your current working directory (which you can find out with getwd())

Stratified Samples

For the weighting, we'll use IMPACT's surveyweights package. It was installed when you installed hypegrammaR. Load it:

# library(surveyweights)

For stratified samples, we need to provide a sampling frame.

# mysamplingframe<-read.csv("../tests/testthat/test_samplingframe.csv")

The sampling frame must have exactly one row per stratum; one column for the strata names, one column for the population numbers. Our example data frame looks like this:

# head(mysamplingframe)

The names must match exactly the values in a column of the data; in this case it is mydata$stratification.

# head(mydata$stratification)

Now we can create a "weighter" object:

# myweighter<-weighting_fun_from_samplingframe(sampling.frame = mysamplingframe,
#                                              data.stratum.column = "stratification",
#                                              sampling.frame.population.column = "population",
#                                              sampling.frame.stratum.column = "strata.names")

Now we can use analyse_indicator just like before, but pass it the weighter we just created, so the weighting will be applied (pay attention to the last argument):

# result<-analyse_indicator(
#                   data = mydata,
#                   dependent.var = "nutrition_need",
#                   independent.var = "region",
#                   hypothesis.type = "group_difference",
#                   independent.var.type = "categorical",
#                   dependent.var.type="numerical",
#                   weighting = myweighter)


# result$summary.statistic

Cluster Samples

without extra weighting

If the clusters don't need to be weighted (in addition to the strata), all you need is to tell analyse_indicator which data variable identifies the cluster:

# result<-analyse_indicator(
#                   data = mydata,
#                   dependent.var = "nutrition_need",
#                   independent.var = "region",
#                   hypothesis.type = "group_difference",
#                   independent.var.type = "categorical",
#                   dependent.var.type="numerical",
#                   weighting = myweighter,
#                   cluster.variable.name = "village")

With extra weighting

If the clustering comes with it's own weights (if records in different clusters within the same stratum had different probabilities to be selected probabilities), you need to load the sampling frame in the same way you did for the stratification, then combine the two weighting functions:

# samplingframe2<-read.csv("../tests/testthat/test_samplingframe2.csv")
# 
# myweighting_cluster<-weighting_fun_from_samplingframe(samplingframe2,
#                                                     data.stratum.column = "village",
#                                                     sampling.frame.population.column = "populations",
#                                                     sampling.frame.stratum.column = "village.name")
# 

# combined_weighting<-combine_weighting_functions(myweighing,myweighting_cluster)

# result<-analyse_indicator(
#                   data = mydata,
#                   dependent.var = "nutrition_need",
#                   independent.var = "region",
#                   hypothesis.type = "group_difference",
#                   independent.var.type = "categorical",
#                   dependent.var.type="numerical",
#                   weighting = combined_weighting,
#                   cluster.variable.name = "village")

Using the Questionnaire

If we further load the questionnaire, we can do some extra cool stuff:

Loading the questionnaire

require("koboquest")
# questionnaire<- load_questionnaire(mydata,
#                               questions.file = "../tests/testthat/kobo questions.csv",
#                               choices.file = "../tests/testthat/kobo choices.csv")



# result<-analyse_indicator(mydata,
#                   dependent.var = "accesstomarket",
#                   independent.var = "region",
#                   dependent.var.type = "categorical",
#                   independent.var.type = "categorical",
#                   hypothesis.type = "group_difference",
#                   weighting=myweighter)


# vis<-map_to_visualisation(result)


mabafaba/hypegrammaR documentation built on Oct. 2, 2019, 11:33 a.m.