clean_and_format: Clean and format data for cerUB protocols

Description Usage Arguments Examples

View source: R/clean_and_format.R

Description

Cleaning and format procedures, including coercing variables as numeric or factor, excluding columns (constants, perturbed, unreliable) and rows (incomplete data, outliers).

Usage

1
2
3
clean_and_format(data, categorical_columns = NULL, numerical_columns = NULL,
  completion_variable = NULL, as_na = NULL, method = NULL,
  columns_to_exclude = NULL, rows_to_exclude = NULL)

Arguments

data

Data frame, a data frame to be prepared for applying cerUB protocols.

categorical_columns

Character/Numeric, vector with the names/indexes of the categorical variables.

numerical_columns

Character/Numeric, vector with the names/indexes of the numeric variables.

completion_variable

Character, vector with two elements (name, value) referencing the column that indicates wheter observations (rows) are completed. For instance, c("isCompleted", "yes").

as_na

Character, vector that specifies values to be considered as NA.

method

Character, method to be used in for replacing NA, if any (replace_na).

columns_to_exclude

Character/Numeric, vector with the names/indexes of columns to exclude.

rows_to_exclude

Character/Numeric, vector with the names/indexes of rows to exclude.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
## Not run: 

dt <- data.frame("First" = c(1,2,2,3,5,1,6,0,4,10),
                 "Second" = c("A","A","A","A","A","A","A","A","A","A"),
                 "Third" = c("1","2","2","3","5","1","6","0","4","10"),
                 "Fourth" = c("A","B","C","D","E","F","G","H","I","J"),
                 "dummy" = c("bla","ble","bli","blo","blu","bla","ble",
                             "bli","blo","blu"),
                 "checked" = c("yes","yes","no","yes","no","yes","yes",
                               "no","yes","yes"))
row.names(dt) <- 1:10
dt_clean <- clean_and_format(dt,
                             categorical_columns = c("Second", "Fourth"),
                             numerical_columns = c("First", "Third"),
                             completion_variable = c("checked","yes"),
                             as_na = c("D"),
                             method = "random",
                             columns_to_exclude = c("dummy"),
                             rows_to_exclude = c(1, 10)
                 )


## End(Not run)

Andros-Spica/cerUB documentation built on June 9, 2020, 9:22 p.m.