LDE.Explore: Variable exploration, classification & descriptive statistics...

Description Usage Arguments Value Author(s) Examples

View source: R/Exports.R

Description

classifies variables of a data.frame by type (bool, categorical, text, numeric, key, etc.) and by its usefulness for analytics (E.g. Na-only, #' 1-value only, plain text types are not useful). Descriptive statistics (mean, min, max, sd, skewness, etc.) are generated for each variable.

Usage

1
LDE.Explore(dat, maxNARate = NULL, keyNamesMatch = NULL)

Arguments

dat

data.frame

maxNARate

numeric vector 0-1. Variables with a higher rate of NAs will be excluded. Null to ignore

keyNamesMatch

string vector containing substrings to search at the start or end of each variable name to classify it as a key. Null to ignore

Value

A list containing the useful variables classified by its type with their descriptive statistics and a vector containing the useful variable names

Author(s)

Daniel Nieto

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
maxNARate <- 0.2                  #NA Threshold
keyNamesMatch<-c("ID","KEY")      #var names that begin or end with id or key will be set as keys

df <- data.frame(secop1.full)

Explore.1.df <- LDE.Explore(df)                              #Basic Exploring
Explore.2.df <- LDE.Explore(df,keyNamesMatch)                #Identifying keys
Explore.3.df <- LDE.Explore(df,NULL,maxNARate)               #With a NAs threshold
Explore.4.df <- LDE.Explore(df,keyNamesMatch,maxNARate)      #Identifying keys and NAs threshold

#See if variables were included or excluded
#View(Explore.4.df$var.status$included) #included vars
#View(Explore.4.df$var.status$excluded) #excluded vars

#See how the included variables were classified E.g.:
#View(Explore.4.df$var.classif$included.vars$df.num) #numeric variables
#View(Explore.4.df$var.classif$included.vars$df.bool) #boolean variables

#See how the excluded variables were classified E.g.:
#View(Explore.4.df$var.classif$removed.vars$not.useful) #excluded by type
#View(Explore.4.df$var.classif$removed.vars$filtered.byNAs) #excluded by NA rate
#View(Explore.4.df$var.classif$removed.vars$not.useful$df.NA) #excluded by type, empty
#View(Explore.4.df$var.classif$removed.vars$filtered.byNAs$df.num) #numeric, excluded by NA rate

#See statistics of variables by exclusion reason
#View(Explore.4.df$statistics$useful.vars) #included
#View(Explore.4.df$statistics$filteredbyNAs.vars) #excluded by NA rate
#View(Explore.4.df$statistics$unuseful.vars) #excluded by type

#See statistics of variables by exclusion reason and type E.g.:
#View(Explore.4.df$statistics$useful.vars$df.levels) #included, type level
#View(Explore.4.df$statistics$filteredbyNAs.vars$df.num) #numeric, excluded by NA rate
#View(Explore.4.df$statistics$unuseful.vars$df.NA) #excluded, type empty

nietodaniel/LargeDataExplorer documentation built on Sept. 20, 2020, 7:57 p.m.