log_cleaning: Log a cleaning change

Description Usage Arguments Value Examples

View source: R/clean.R

Description

Log a cleaning change

Usage

1
2
3
log_cleaning(data, uuid, action, extra_columns = list(),
  question.name = NULL, new.value = NULL, issue = NULL, dir,
  filename = "cleaning_logbook.csv")

Arguments

data

The dataset that is being cleaned. Has to be a data.frame (no tibble).

uuid

A vector with uuid character objects. Can be one but also multiple. If there are multiple the cleaning will iterate over all of them and use the same question.name, issue and new.value for them.

action

This is the action that needs to be done on the defined row. It is either flag (f), change (c) or deletion (d).

extra_columns

Must be a list with the names of the objects being the names of the columns in the cleaning log and the content of the list being the names of the dataset columns associated with it (see example). This is not required. But it should be the same as the initiated logbook.

question.name

The name of the column in which the value is that needs to be flagged or changed.

new.value

The value that the old value needs to be changed to (only for change action). Must be either same length as uuid vector or lenght 1 (if all need to be changed to same value).

issue

A description of the issue why it is being cleaned.

dir

The directory name of the location where it needs to be saved.

filename

Name of the logbook. Default is cleaning_logbook.csv. Needs to have a csv extention.

Value

It returns the logbook as data.frame.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
dir <- getwd()
extra_columns <- list(population_group = "population_group", governorate = "governorate_mcna")
## Not run: 
init_cleaning_log(dir, extra_columns = extra_columns)

data(mcna2019)
library(dplyr)
idp_first_place <- mcna2019 %>% dplyr::filter(idp_first_place == "yes")
flag <- difftime(as.POSIXct(idp_first_place$arrival_date_idp, format="%Y-%m-%d"),
                 as.POSIXct(idp_first_place$displace_date_idp, format="%Y-%m-%d"),
                 units = "weeks") > 4
idp_first_place[which(flag), c("displace_date_idp", "arrival_date_idp")]
uuid <- idp_first_place$X_uuid[which(flag)]
log <- log_cleaning(mcna2019, uuid, action = "f",  extra_columns = extra_columns,
                                    question.name="arrival_date_idp",
                                    issue="The difference between displace date
                                    and arrival date while it being first place of
                                    displacement is more than 4 weeks",
                                    dir = dir)

## End(Not run)

boukepieter/dclogger documentation built on Feb. 7, 2020, 8:34 p.m.