Description Usage Arguments Value Author(s) Examples
classifies variables of a data.frame by type (bool, categorical, text, numeric, key, etc.) and by its usefulness for analytics (E.g. Na-only, #' 1-value only, plain text types are not useful). Descriptive statistics (mean, min, max, sd, skewness, etc.) are generated for each variable.
1 | LDE.Explore(dat, maxNARate = NULL, keyNamesMatch = NULL)
|
dat |
data.frame |
maxNARate |
numeric vector 0-1. Variables with a higher rate of NAs will be excluded. Null to ignore |
keyNamesMatch |
string vector containing substrings to search at the start or end of each variable name to classify it as a key. Null to ignore |
A list containing the useful variables classified by its type with their descriptive statistics and a vector containing the useful variable names
Daniel Nieto
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 | maxNARate <- 0.2 #NA Threshold
keyNamesMatch<-c("ID","KEY") #var names that begin or end with id or key will be set as keys
df <- data.frame(secop1.full)
Explore.1.df <- LDE.Explore(df) #Basic Exploring
Explore.2.df <- LDE.Explore(df,keyNamesMatch) #Identifying keys
Explore.3.df <- LDE.Explore(df,NULL,maxNARate) #With a NAs threshold
Explore.4.df <- LDE.Explore(df,keyNamesMatch,maxNARate) #Identifying keys and NAs threshold
#See if variables were included or excluded
#View(Explore.4.df$var.status$included) #included vars
#View(Explore.4.df$var.status$excluded) #excluded vars
#See how the included variables were classified E.g.:
#View(Explore.4.df$var.classif$included.vars$df.num) #numeric variables
#View(Explore.4.df$var.classif$included.vars$df.bool) #boolean variables
#See how the excluded variables were classified E.g.:
#View(Explore.4.df$var.classif$removed.vars$not.useful) #excluded by type
#View(Explore.4.df$var.classif$removed.vars$filtered.byNAs) #excluded by NA rate
#View(Explore.4.df$var.classif$removed.vars$not.useful$df.NA) #excluded by type, empty
#View(Explore.4.df$var.classif$removed.vars$filtered.byNAs$df.num) #numeric, excluded by NA rate
#See statistics of variables by exclusion reason
#View(Explore.4.df$statistics$useful.vars) #included
#View(Explore.4.df$statistics$filteredbyNAs.vars) #excluded by NA rate
#View(Explore.4.df$statistics$unuseful.vars) #excluded by type
#See statistics of variables by exclusion reason and type E.g.:
#View(Explore.4.df$statistics$useful.vars$df.levels) #included, type level
#View(Explore.4.df$statistics$filteredbyNAs.vars$df.num) #numeric, excluded by NA rate
#View(Explore.4.df$statistics$unuseful.vars$df.NA) #excluded, type empty
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.