library(ifnBase) platform_define_survey("intake", survey_id = 9, table ="intake", template="eu:intake", mapping=list()) platform_define_survey("weekly", survey_id = 8, table ="weekly", template="eu:weekly", mapping=list()) knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
ifnBase provides two categories of functions to work with surveys.
To work these functions need the surveys to be registered to the package by providing a survey definition.
Survey definition is provided by calling the platform_define_survey()
function, either in the platform file or in a script before using it.
To do it in the platform file will guarantee the survey definition is available everywhere once the package is loaded and only defined once.
Two examples :
# A Simple survey about vaccination platform_define_survey( name="mysurvey", table="pollster_results_vaccination", survey_id = 12, mapping= list( 'age'='Q1' # Q1 = Name in the DB -> 'age' name in the R data 'vaccination'='Q2', 'vacc.at_risk'='Q3_1', 'vacc.health_prof'='Q3_2' 'vacc.work'='Q3_3' ), recodes=list( 'vaccination'=c('yes'='1','no'='2', 'unknown'='3') # Describe recoding to meaningful levels ), labels = list( 'vacc.reason'='vacc.*' # Define a pattern to identify all variables about vaccination reason ) ) # Define "intake" survey to be exactly as the european one. platform_define_survey(name = "intake", template = "eu:intake") # That's all.
the platform_define_survey
function returns a value but you dont need to care about it. The package manage a registry of all surveys. The surveys is automatically
registered when you call this function !
each element is a parameter to pass to the platform_define_survey
function.
A unique internal name has to be given to each survey. This is the name to be used when calling the packages functions. It can be different than the survey name registered in the Influenzanet platform database (but it's a good idea to have the same name ;)).
id of the survey in the Influenzanet platform database (if not provided some functions like survey_load_question()
, survey_load_all()
will not be useable).
This parameter is only useable if you work with a full featured Influenzanet platform's database (with all tables of the pollster app). You can omit it if you work offline or with a restricted database.
table
parameter has to be set to the name of the table containing the survey data. If several table are available it must be the common table ("pollster_results_intake" for example)
This package handles 2 data models :
- single table model: all data of all seasons are stored in the same table (e.g. all in "pollster_results_intake"), to use this model set parameter single.table
to TRUE
- multiple table: one table for each season, then each season should be described individually (see season's handling section). single.table
should be FALSE (default).
This parameters is a list of key value pairs, the key is the variable name, the value the name of the database column for this variable.
mapping= list( 'age'='Q1' # Q1 = Name in the DB -> 'age' name in the R data 'vaccination'='Q2', 'vacc.at_risk'='Q3_1', 'vacc.health_prof'='Q3_2' 'vacc.work'='Q3_3' )
This parameter should be a list of list. The first key, should be the variable name, the value a list of key,value pairs, with recoded label as key and database value as value.
recodes = list( smoker = c( 'smoker.no'="0", 'smoker.occas'="1", 'smoker.dailyfew'="2", 'smoker.daily'="3", 'smoker.dkn'="4", 'smoker.stopped'="5", 'smoker.juststop'="6" ), pregnant = c( 'Yes' = '0', 'No' = '1', 'DNK'='2' ), another_variable = recode_alias("pregnant") )
It is recommended to use prefixed labels ('smoker.no' instead of 'no') because this is really identify the modality of this question (by making the label unique across the surveys) and enable specific translation for this label.
It is possible to reuse an already define recoding by using the function recode_alias() with the name of the variable to copy the recoding from.
Labels are a way to define group of variable names (see survey_labels()). They can be defined either with the full list of names
labels = list( 'vacc.reason'='vacc.*' # Define a pattern to identify all variables about vaccination reason. Will select all variable starting with 'vacc.' )
It's possible to define a survey by inheriting an existing template (with mapping, recodes & labels defined).
2 templates are built-in the package : - 'eu:weekly' : weekly Influenzanet survey - 'eu:intake' : intake Influenzanet survey
Several functions are available to load information about a survey defined in an Influenzanet platform database.
These functions are only needed if you want to work with the definitions stored in the database (how data are collected using the web forms), they are not necessary to work with surveys responses (therefore it's possible to work only with the survey data, without the database). The survey definition provided by this package allow to work offline or with a data only database (with only "_results" tables).
load survey data, all options are described in R documentation ?survey_load_results
. Basically you can :
- load data of a season with the season
parameter
- for a country (in european database)
- for a list of participants (survey.users
parameter)
- get geographical levels matching with the participant location (for intake survey) with the geo
parameter
- get a subset of columns (using cols
) parameter, the name to use are the either the database column name or the internal aliases
# Load weekly data for the season 2017-2018, with all columns weekly = survey_load_results("weekly", cols="*", season=2017) # Load intake for 2019-2020 seasons, and match with the corresponding nuts2 level for each fetched survey intake = survey_load_results("intake", cols=c('timestamp','date-birth','code_zip'), season=2019, geo="nuts2")
Results will have several mandatory variables (regardless the parameter cols
)
- id
: unique number identifying the survey response
- person_id
: number of the participant (see note below)
In the resulting data.frame variable names names will be the internal aliases defined in the survey definition
Note about participants identification The Influenzanet platforms identify participants by a string usually called "global_id". This package doesn't load use it directly for memory saving reasons. We use the integer id in the survey_surveyuser table, in this package it is named "person_id" in the data and parameter of function who uses it.
survey_surveyuser
tableTo analyse the data we don't work with the names of the database tables (Qxxx
) because they are not meaningful and error prone but with internal variable names (called aliases)
The mapping between the database column names and variable names is defined in the survey survey definition.
For "intake" and "weekly", the set of variables is available from a template for each survey, using them allow to use the same names in all analysis whatever the platform. it's possible to override a template to add platform's specific questions and coding.
In principle you would not have to use database column name (the goal of the package is to do the job for you).
Survey data are encoded using integer codes to represent each variable modality. These codes are defined for each question (options
in the survey data model of Influenzanet platform).
Unfortunately those codes are not meaningful (e.g. '0' can represent either the response 'Yes', or 'No') and error prone.
To reduce the error by manipulating data we propose a recoding of each categorical variable with a unique and meaningful set of labels (they are not perfect but better than a number).
Predefined labels are provided for Yes/No/I dont know questions with labels 'Yes', 'No', and 'DNK'.
This package provides 3 constants for each corresponding level : YES
, NO
, DONTKNOW
they are useful to test a recoded value with this label (more error proof than using the label itself)
intake = survey_load_results("intake", c('vacc.curseason','pregnant')) # Load intake data with vaccination and pregnancy responses intake = recode_intake(intake) # Recode to labels if(intake$vacc.curseason == YES) { # DO something } # Also useable to select subset not_pregnant = intake %>% filter(pregnant == NO)
For weekly and intake the package provides 2 dedicated function. They will recode categorical variables to factor using labels mapping in the corresponding survey definition. They also recode some other variables (like date) and check for some inconsistencies.
They are more specialized than survey_recode_all()
.
intake = survey_load_results("intake", "*") # Load all intake data for the current season intake = recode_intake(intake) # Recode all variables weekly = survey_load_results("weekly", "*") # Load all intake data for the current season weekly = recode_weekly(weekly, all.variables = FALSE) # Recode only some specific variables weekly = recode_weekly(weekly, all.variables = TRUE) # Recode all known variables available in weekly data.
These two functions are useable with data loaded from database or stored in a Rdata file. If some variables are already recoded (in factor) they will not be recoded. Of course before recoding, the variables should be encoded with the database code (if you change these values before recoding it will lead to unpredictable results).
This function recode a vector of values using the mapping of a known variable.
x = c(0, 1, 0, 0, 1) x = survey_recode(x, variable="pregnant", survey="intake") # Recode using the mapping of the "pregnant" variable of intake survey print(x)
This function recode all known variable in data
using recoding from the survey name survey
Caution, they are
weekly = survey_recode_all(weekly, survey="weekly") # Recode all variables in weekly data # If another survey is registred for example, you have defined a survey called "my-study" data = survey_recode_all(data, survey="my-study")
survey_variable_recoding()
Get the mapping to recode a given variable of a survey
survey_variable_recoding("intake", "pregnant") survey_variable_recoding("intake", "smoker")
The mention '(inherited)' means the recoding comes from the template the survey definition inherits.
Return the recoding mapping of all known variable of a survey
survey_recodings("intake")
Several survey variables can store the responses for a single survey question, for example the multiple choices questions (each modality response is stored in a boolean column a).
A survey definition can contains "labels", a label in a named list of values, useable to define "groups". They can be used to get the list of recoding labels but also to get list of variables corresponding to a multiple choice question.
The function survey_labels()
get the group with the name of labels.
condition.vars = survey_labels("intake", "condition") # List of variables for the "condition" question of intake
This mechanism allows to write programs without knowing the exact list of modalities of the variable (if it changes, only the definition has to be updated)
For example:
condition.vars = survey_labels("intake", "condition") # List of variables for the "condition" question of intake allergy.vars = survey_labels("intake", "allergy") # List of variables for the "condition" question of intake intake = survey_load_results("intake", cols=c(condition.vars, allergy.vars), season=2016) # Get the response of the 2 questions # Frequency of condition condition.freq = multiple_freq(intake[, condition.vars]) gg_barplot_percent(condition.freq) # plot it # Frequency of allergies allergy.freq = multiple_freq(intake[, allergy.vars]) gg_barplot_percent(allergy.freq) # plot it
You can get all the configuration of a survey by calling survey_definition(). It will show all variable aliases, label recoding, and other parameters of a survey.
survey_definition("intake") # Show intake survey definition
The function survey_aliases()
is responsible to convert from database column name to internal variable names (called aliases).
The survey definition should be passed directly
survey_aliases('code_zip', survey_definition("intake")) # 'Q3'
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.