knitr::opts_chunk$set(echo = TRUE)
The purpose of this vignette is to explain data cleaning using impactR
. Note that the cleaning functions are linked to the data collection monitoring functions. So if one does not use the data collection monitoring functions, one will need to match the cleaning log to the functions shown below.
# If impactR is not yet installed # devtools::install_github("impactR) library(impactR)
Let's import the dataset to clean.
# Load dataset in environment data(data) # Show the first lines and types of airports' tibble data <- data |> tibble::as_tibble()
Import the 'survey' sheet:
data(survey) survey <- survey |> tibble::as_tibble()
The first thing to do with the 'survey' object, as it will be used elsewhere, is to split the 'type' column into two columns.
# Except for the 'col_to_split' argument (the column to split), the other parameters are the default parameters survey <- survey |> split_survey( col_to_split = "type", into = c("type", "list_name"), sep = " ", fill = "right")
Import the 'choices' sheet:
data(choices) choices <- choices |> tibble::as_tibble()
Import the cleaning log:
data(cleaning_log) cleaning_log <- cleaning_log |> tibble::as_tibble()
It's as simple as two steps.
1- Check if the cleaning log is minimally well filled:
check_cleaning_log(cleaning_log, data, uuid, "autre_") # If NULL, then ok
2 - Use the clean_all()
function:
cleaned_data <- clean_all(data, cleaning_log, survey, choices, uuid, "autre_")
The cleaned_data
object is the cleaned dataset with :
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.