LDE.AutoProcess: Automatical exploration, variable filtering & re-formatting...

Description Usage Arguments Value Author(s) Examples

View source: R/Exports.R

Description

Automatically performs LDE.Explore() and then LDE.UsefulVars(), finally returns the transformed dataset, excluding the unuseful variables, and the $statistics $var.status and $var.classif of LDE.UsefulVars()

Usage

1
LDE.AutoProcess(dat, maxNARate = NULL, keyNamesMatch = NULL)

Arguments

dat

data.frame

maxNARate

numeric vector 0-1. Variables with a higher rate of NAs will be excluded. Null to ignore

keyNamesMatch

string vector containing substrings to search at the start or end of each variable name to classify it as a key. Null to ignore

Value

The filtered dataset, with re-formatted variables and all the process information including descriptive statistics

Author(s)

Daniel Nieto

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
df <- data.frame(secop1.full)
maxNARate <- 0.2
keyNamesMatch<-c("ID","KEY")

#Basic AutoProcess of the data.frame
Auto.1.df <- LDE.AutoProcess(df)

#Using a Max Rate of NA per variable
Auto.2.df <- LDE.AutoProcess(df,NULL,maxNARate)

#Using substrings to identify variables names as keys, but without NAs filtering
Auto.3.df <- LDE.AutoProcess(df,keyNamesMatch)

#Using Max Rate of NAs and substrings to identify variables names as keys
Auto.4.df <- LDE.AutoProcess(df,keyNamesMatch,maxNARate)

#Obtention of the cleaned dataset
df.clean<-Auto.4.df$df.filtered

#See if variables were included or excluded
#View(Auto.4.df$var.status$included) #included vars
#View(Auto.4.df$var.status$excluded) #excluded vars

#See how the included variables were classified E.g.:
#View(Auto.4.df$var.classif$included.vars$df.num)  #numeric vars
#View(Auto.4.df$var.classif$included.vars$df.bool) #boolean vas

#See how the excluded variables were classified E.g.:
#View(Auto.4.df$var.classif$removed.vars$not.useful)            #excluded by type
#View(Auto.4.df$var.classif$removed.vars$filtered.byNAs)        #excluded by NA rate
#View(Auto.4.df$var.classif$removed.vars$not.useful$df.NA)      #excluded by type, empty
#View(Auto.4.df$var.classif$removed.vars$filtered.byNAs$df.num) #numeric excluded by NA rate

#See statistics of variables by exclusion reason
#View(Auto.4.df$statistics$useful.vars)        #included
#View(Auto.4.df$statistics$filteredbyNAs.vars) #excluded by NAs
#View(Auto.4.df$statistics$unuseful.vars)      #excluded by type

#See statistics of variables by exclusion reason and type E.g.:
#View(Auto.4.df$statistics$useful.vars$df.levels)     #included that were levels
#View(Auto.4.df$statistics$filteredbyNAs.vars$df.num) #numeric, excluded by NA rate
#View(Auto.4.df$statistics$unuseful.vars$df.NA)       #excluded by type, empty vars

nietodaniel/LargeDataExplorer documentation built on Sept. 20, 2020, 7:57 p.m.