knitr::opts_chunk$set(echo = TRUE)
This document aims at providing the workflow that may arise from and through the use of impactR
. Let's load the package.
# If you haven't done so # devtools::install_github("impactR) library(impactR)
Below, let's use the 'airports' dataset from package nycflights13
# Load dataset in environment data(data) # Show the first lines and types of airports' tibble data <- data |> tibble::as_tibble() data
Most of impactR
's functions are written with the assumption that the provided data may be coerced to a tibble, since it extensively use the tidyverse.
Let's sat we have a survey sheet composed as:
data(survey) survey <- survey |> tibble::as_tibble(survey)
First thing to do with 'survey' as it will be used elsewhere: separate the type column.
# All are already defaults apart from 'col_to_split' survey <- survey |> split_survey( col_to_split = "type", into = c("type", "list_name"), sep = " ", fill = "right")
In order to obtain the so-called outliers (for numerical variables), you can use the make_log_outlier
function which allows you to obtain two types of outliers: the deviation from the mean using the outliers_sd
function and the interquartile range using the outliers_iqr
function.
# Helper to get numeric columns # numeric_cols(data) # Get all numeric columns: it includes for example numeric enumerator ids or gps locations # numeric_cols(data, survey) # Check for numeric columns using the survey sheet. # Get IQR outliers (1.5 default rule) # outliers_iqr(data, col = i_enquete_age,times = 1.5, id_col = uuid) # id_col is usually uuid with Kobo # Get standard deviation outliers # outliers_sd(data, i_enquete_age, times = 3, id_col = uuid) # Create the full log of outliers of all numeri columns log_outliers <- make_log_outlier(data, survey, id_col = uuid, today, i_enum_id, i_zad) log_outliers
This functions needs the Kobo question with "other" answer to be defined as "variable" for the parent question and "other_variable" for the parent question. "other_" maybe different and is defined in the following question by the 'other' arg, being the character pattern. In the example that follows, it is "autre_".
# Get other answers other_cols <- other_cols(data, "autre_", id_col = uuid) # Get other parent answers other_parent_cols(data, other_cols, "autre_", id_col = uuid) # Fabriquer le journal de nettoyage des "autres" log_others <- make_log_other(data, survey, "autre_", uuid, i_enum_id, i_zad) log_others
All you need to do is to have produced an Excel file of logical checks in advance, which can be modified during the collection process.
data(check_list) check_list <- check_list |> tibble::as_tibble() check_list
The check excel spreadsheet must follow a few rules in order to be read by the impactR [add list] functions. For example, it is necessary that all variables present in the logical tests (column 'logical_test' of the check table) also exist in the data, object data
. For this, we can use the check_check_list
function, which aims to validate or not a logical check table [note: it is not yet robust, but it already allows to do a number of checks].
For example, below the column survey_duration
does not exist in the data
. However, there are logical checks that take it into account in the logical checks spreadsheet check_list
. If we run the following command, we would get an error:
check_check_list(check_list, data) # following column/s from `question_name` is/are missing in `.tbl`: survey_duration, survey_duration, survey_duration
So we will add the survey_duration column using the survey_duration
function. We take the opportunity to add the time difference between two surveys per interviewer:
data <- data |> survey_duration(start, end, new_colname = "survey_duration") #|> # NOT RUN! # survey_difftime(start, end, new_colname = "survey_difftime", i_enum_id) data$survey_duration
We can check the logical check spreadsheet again:
check_check_list(check_list, data)
This time, it's ok, the function gives TRUE
, so we can proceed with the production of the cleaning log.
log_check_list <- make_log_from_check_list(data, survey, check_list, uuid, today, i_enum_id, i_zad) log_check_list
Finally, we just need to combine all these cleaning logs into one. We can then export it to an excel file.
log <- list(log_outliers, log_check_list, log_others) |> purrr::map(~ .x |> dplyr::mutate(dplyr::across(.fns = as.character))) |> dplyr::bind_rows() |> readr::type_convert()
It is also possible to use the make_all_logs
function which combines these three functions and outputs a single cleanup log.
log <- make_all_logs(data, survey, check_list, "autre_", uuid, today, i_enum_id, i_zad) log
Finally, there are several ways to export. Here we give the example with the package writexl
:
# Not run! The simplest one # writexl::write_xlsx(log, "output/log.xlsx) # You can add the current date to allow tracking # writexl::write_xlsx(log, paste0("output/log_", Sys.Date(), ".xlsx))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.