knitr::opts_chunk$set( collapse = TRUE, comment = ">", results = "asis", prompt = FALSE, cache = FALSE, message = FALSE, warning = FALSE, echo = TRUE ) library(googlesheets4) sheets_auth( email = gargle::gargle_oauth_email(), path = "../../../GLAD-Questionnaires/sheet_scripts/glad-dict.json", scopes = "https://www.googleapis.com/auth/spreadsheets", cache = gargle::gargle_oauth_cache(), use_oob = gargle::gargle_oob_default(), token = NULL ) # googlesheets4 option, see https://gargle.r-lib.org/articles/non-interactive-auth.html options(gargle_oauth_email = TRUE)
GLAD_read: Read in all the Qualtrics exported raw csv files in the specified path.
GLAD_sheet: Read in the googlesheet dictionary sheet for a specified
questionnaire. Note that the googlesheet must be read through this
function and not by the googlesheets4 package directly.
GLAD_clean: Cleans all questionnaires or one specified questionnaire
in 'dat_list' and creates exports.
GLAD_select: Exports selected variables from a data set by specifying their names in a text file.
GLAD_derive: Generates derived variables with names and formulae specified in the
GLAD dictionary
GLAD_getdescr: Get Descripton (Title) for Selected Variables by specifying their names.
GLAD_plot: Generates plots for the specified variable in the GLAD data.
GLAD_vartest: Runs several variance tests (Levene's test, Fligner's test and Barlett's
test) for comparing variance across gender.
GLAD_missing: Produces various summary statistics and plots for missingness
examination of a questionnaire.
GLAD_qplot: Generates quantile plots for a specified variable in the GLAD data.
library(gladfunctions)
Specify the path to the directory containing the raw files exported directly from Qualtrics. This must contain the sign-up data and if an optional questionnaire is to be cleaned, the data of that optional questionnaire.
We have a python script to prepare raw data files (export Qualtrics
data and remove participant personal information). Please speak to Henry
Rogers in the BioResource office.
raw_path <- "~/Data/GLAD/data_raw/"
raw_pathdat_list <- GLAD_read(raw_path)
clean_path <- "~/Data/GLAD/data_clean/"
Clean all the questionnaire and export to clean_path
Specify limits = FALSE to avoid applying limits to continuous
variables, so we could later examine the effects of limits with
plotting functions.
Specify rename = TRUE to rename all variable names to Easy.name
(New.variable names if FALSE)
Specify format which should be one of rds, feather, sav (for
SPSS/JASP), sas (for SAS) and dta (for Stata).
GLAD_clean("ALL", dat_list, clean_path, limits = FALSE, rename = TRUE, format = "rds")
Clean a specified questionnaire and export to clean_path
On calling the function you will be prompted with a Google login page, after you agree for the credential file to be created. Please log in with an account that would allow you access to the GLAD dictionary.
Click "Allow" to grant permission to the Tidyverse API Packages.
"Click allow for the Tidyverse API Package to see, edit and delete your spreadsheets in Google Drive"
This will prompt you to enter a code. Please copy the code from the Google sign in and paste this where is says "Enter authorization code:" in your R console.
GLAD_clean("CIDID", dat_list, clean_path, limits = FALSE, rename = TRUE, format = "rds")
Easy.names.
Each text file should be named as the acronym of a questionnaire (as in
the dictionary tabs). Each variable name within the text file should be
in a seperate line.An empty text file with the questionnaire name will export all variables in that questionnaire.
GLAD_select(clean_path, export_path = "./Data-Request/", person = "Leo", c("DEM.txt", "PAD.txt"), format = "sav" )
rds is a fast loading file format that preserves variable type
information and can be read like a csv.
I'm reading in the '_Renamed' version with Easy.name, but note that all
the following should also work for the unrenamed version ( with
New.variable names)
PAD <- readRDS(paste(clean_path, "rds_renamed/PAD_Renamed.rds", sep = "/")) # `GLAD_sheet` is a vectorised function (vectorised functions take a vector # and operate on all the items in a vector). This means that we can read in # multiple sheets at once and store them in a list. # We need to use [[1]] (with double square brackets) to extract the first # element of an R list list. sheet <- GLAD_sheet("PAD")[[1]]
Please check sheet 'PTSD' on the dictionary sheet for a simple example and "AGP" for a more complicated one. Instructions are also available here.
PTSD <- readRDS(paste(clean_path, "rds_renamed/PTSD_Renamed.rds", sep = "/")) sheet2 <- GLAD_sheet("PTSD")[[1]] AGP <- readRDS(paste(clean_path, "rds_renamed/AGP_Renamed.rds", sep = "/")) sheet3 <- GLAD_sheet("AGP")[[1]] PTSD_withderived <- GLAD_derive(PTSD, sheet2) AGP_withderived <- GLAD_derive(AGP, sheet3) # These are the last few columns we just added tail(colnames(PTSD_withderived), 2) tail(colnames(AGP_withderived), 4)
GLAD_getdescr(colnames(PAD)[5:8], sheet)
as.numeric factor variablesWe normally are unable to recode factor variables as numeric variable.
Applying as.numeric to the R built-in class factor only returns its
internal integer representation.
unique(as.numeric(as.factor(PAD[["pad.anx_future_panic_attacks"]])))
However, the lfactor package preserves numeric values when recoding a
numeric variable to factor. Therefore, we can use this package to recode
our factor variables to numeric variables.
For example, for a Binary variable:
unique(as.numeric(PAD[["pad.anx_future_panic_attacks"]]))
Alternatively, you can use numeric copies of the factor variables. These numeric copies have been copied from the raw data and put into the cleaned data. These copy variables have the same names as the original variables but with "_numeric" at the end.
grep("numeric", colnames(PAD), value = T)
GLAD_plot function returns plots for a specified variable,
different plots are returned depending on the variable type, the
information of which is provided through the googlesheet argument.```r
fig.width to avoid the figure being truncated.GLAD_plot(data = PAD, var = "Sex", googlesheet = sheet)
```r GLAD_plot( data = PAD, var = "pad.anx_future_panic_attacks", googlesheet = sheet )
With variables of Numeric/Continuous type, multiple plot objects are
returned in a named list.
Note that for these variables, it is possible to supply a logical
argument include_outlier. For variables that don't have maximum or
minimum in the dictionary hence haven't been cleaned, this is useful for
deciding cut-offs. When include_outlier is set to TRUE, it's also
possible to specify your own limits for plots through the limit argument.
Set binwidth = "FD" to apply the Freedman-Diaconis rule for deciding
optimal binwidth. The default is '1' and should be suitable for most
variables in the data sets.
continuous_plots <- GLAD_plot( data = PAD, var = "pad.frequency_panic_attacks", googlesheet = sheet, include_outlier = FALSE, limits = c(1, 50), binwidth = "FD" )
continuous_plots$point
continuous_plots$hist
continuous_plots$density
continuous_plots$densitybysex
This plot is for categorical variables that allow multiple options to be selected. Put in one of the variables representing the options.
Don't seem to have a general way to extract the title from the dictionary. Specify the title for the plot yourself.
GLAD_plot( data = PAD, var = "pad.sweating", title = "PAD Screening", googlesheet = sheet )
GLAD_qplot(data = PAD, var = "pad.frequency_panic_attacks", googlesheet = sheet)
These tests allow us to explore whether variances differ across sex.
GLAD_vartest(data = PAD, var = "pad.frequency_panic_attacks")
For quantile plot and variance test, I also have a version that extracts all the continuous variables and loops through them. Would that be more desirable?
GLAD_missng function returns a named list of various missingness summaries and plots.missingness <- GLAD_missing(data = PAD) ``` * Percentage of participants with no variable missing. ```r missingness$percent ``` ```r missingness$descr
missingness$freq
The table below: a value of '1' indicates the percentage of participants missing for the corresponding variable.
print(missingness$table)
missingness$plot
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.