knitr::opts_chunk$set( collapse = TRUE, eval = F, comment = "#>" )
The hypegrammaR package implements the IMPACT quantitative data analysis guidelines. While this guide works on its own, all of this will make a lot more sense if you read those first.
get all your files in csv format first:
your sampling frame(s) as csv file(s) if the analysis needs to be weighted
your kobo questionnaire questions and choices sheets as two csv files; Necessary if you have select_multiple questions, skip logic or want to use the labels.
IMPORTANT NOTE: your data and questionnaire must adhere to the standard xml style output from kobo otherwise this will either not work or produce wrong results.
- column headers unchanged
- questionnaire names unchanged
- xml values (NOT labeled)
- select_multiple questions have one column with all responses concatenated (separated by blank space " "), and one column for each response named [question name].[choice name]
Your sampling frame must be in the correct format:
- must have one row per stratum; one column for population numbers; one column for strata names
- The values in the column with the strata names must appear exactly identical in a column in the data
This line you only have to run once when using hypegrammaR for the first time, or to update to a new version.
devtools::install_github("ellieallien/hypegrammaR",build_opts = c(), ref = "master")
(the build_opts = c()
makes sure the package includes all extra help pages & documentation)
For this step only, you need a more or less stable internet connection.
library(hypegrammaR) library(koboquest)
load_data
takes only one argument file
, the path to the csv file.
assessment_data<-load_data(file = "../data/testdata.csv")
Conditions:
This is only necessary if the analysis needs to be weighted.
The sampling frame should be a csv file with one column with strata names, one column with population numbers.
load_samplingframe
takes only one argument file
, the path to the csv file.
sampling_frame<-load_samplingframe(file = "../data/test_samplingframe.csv")
Now turn your sampling frame into a weighting function with map_to_weighting
. We see that the relevant columns in the sampling frame are called "strata names" and "population". The column in the data correspoinding to the strata names is called "stratification" in this data set. We need to supply these column names as arguments to map_to_weighing
:
myweighter<-map_to_weighting(sampling.frame = sampling_frame, data.stratum.column = "stratification", sampling.frame.population.column = "population", sampling.frame.stratum.column = "strata.names", data = assessment_data)
Finally the questionnaire, which depends on the question and the choices sheet as a csv. hypegrammaR can live without this, but a questionnaire is necessary for correct analysis of select_multiple questions.
The parameters are:
data
: the object that stores the data loaded abovequestions.file
: the path to the questions sheet as a csv filechoices.file
the path to the choices sheet as a csv filechoices.label.column.to.use
: the exact name of the column containing the labels that should be used. You can add an extra column with custom labels to your choices sheet if you don't want to use the choice labels from the original questionnairequestionnaire<- koboquest::load_questionnaire(data = assessment_data, questions = "../data/test_questionnaire_questions.csv", choices = "../data/test_questionnaire_choices.csv", choices.label.column.to.use = "label::English" )
Conditions:
For this example, our hypothesis is that idp or refugee households received different assistance compared host community hh.
my_case<-map_to_case(hypothesis.type = "group_difference", dependent.var.type = "numerical", independent.var.type = "categorical")
You can find out what exactly you can/should enter for these parameters by running ?map_to_case
, which will open the help page for the function (this works for any function)
Now you use the inputs loaded above to an analysis result:
expenditure_urban_rural<-map_to_result(data = assessment_data, dependent.var = "expendituretotal", independent.var = 'urbanrural', case = my_case, weighting = myweighter, questionnaire = questionnaire)
First, we add labels to the result (assuming that you've loaded a questionnaire):
expenditure_urban_rural<-map_to_labeled(result = expenditure_urban_rural, questionnaire = questionnaire)
Finally, we can get visualisations, tables etc.:
chart<- map_to_visualisation(expenditure_urban_rural) table<-map_to_table(expenditure_urban_rural) chart table
To save/export any results, you can use the generic map_to_file
function. For example:
# map_to_file(results$summary.statistics,"summary_stat.csv") # map_to_file(myvisualisation,"barchart.jpg",width="5",height="5")
You will find the files in your current working directory (which you can find out with getwd()
)
For the weighting, we'll use IMPACT's surveyweights
package. It was installed when you installed hypegrammaR
. Load it:
# library(surveyweights)
For stratified samples, we need to provide a sampling frame.
# mysamplingframe<-read.csv("../tests/testthat/test_samplingframe.csv")
The sampling frame must have exactly one row per stratum; one column for the strata names, one column for the population numbers. Our example data frame looks like this:
# head(mysamplingframe)
The names must match exactly the values in a column of the data; in this case it is mydata$stratification
.
# head(mydata$stratification)
Now we can create a "weighter" object:
# myweighter<-weighting_fun_from_samplingframe(sampling.frame = mysamplingframe, # data.stratum.column = "stratification", # sampling.frame.population.column = "population", # sampling.frame.stratum.column = "strata.names")
Now we can use analyse_indicator
just like before, but pass it the weighter we just created, so the weighting will be applied (pay attention to the last argument):
# result<-analyse_indicator( # data = mydata, # dependent.var = "nutrition_need", # independent.var = "region", # hypothesis.type = "group_difference", # independent.var.type = "categorical", # dependent.var.type="numerical", # weighting = myweighter) # result$summary.statistic
If the clusters don't need to be weighted (in addition to the strata), all you need is to tell analyse_indicator
which data variable identifies the cluster:
# result<-analyse_indicator( # data = mydata, # dependent.var = "nutrition_need", # independent.var = "region", # hypothesis.type = "group_difference", # independent.var.type = "categorical", # dependent.var.type="numerical", # weighting = myweighter, # cluster.variable.name = "village")
If the clustering comes with it's own weights (if records in different clusters within the same stratum had different probabilities to be selected probabilities), you need to load the sampling frame in the same way you did for the stratification, then combine the two weighting functions:
# samplingframe2<-read.csv("../tests/testthat/test_samplingframe2.csv") # # myweighting_cluster<-weighting_fun_from_samplingframe(samplingframe2, # data.stratum.column = "village", # sampling.frame.population.column = "populations", # sampling.frame.stratum.column = "village.name") # # combined_weighting<-combine_weighting_functions(myweighing,myweighting_cluster) # result<-analyse_indicator( # data = mydata, # dependent.var = "nutrition_need", # independent.var = "region", # hypothesis.type = "group_difference", # independent.var.type = "categorical", # dependent.var.type="numerical", # weighting = combined_weighting, # cluster.variable.name = "village")
If we further load the questionnaire, we can do some extra cool stuff:
require("koboquest") # questionnaire<- load_questionnaire(mydata, # questions.file = "../tests/testthat/kobo questions.csv", # choices.file = "../tests/testthat/kobo choices.csv") # result<-analyse_indicator(mydata, # dependent.var = "accesstomarket", # independent.var = "region", # dependent.var.type = "categorical", # independent.var.type = "categorical", # hypothesis.type = "group_difference", # weighting=myweighter) # vis<-map_to_visualisation(result)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.