exploreData: Exploratory data analysis

Description Usage Arguments Value Author(s) Examples

View source: R/exploreData.R

Description

Performs exploratory data analysis and returns a summary data.frame containing useful information regarding features in the dataset

Usage

1
2
3
4
exploreData(x, removeFeats = NULL, missPercent = 0.3,
  outlierMethod = "tukey", lowPercentile = 0.01, upPercentile = 0.99,
  minLevelPercentage = 0.025, minUnique = 25, minChrPercentage = 0.2,
  numChars = 25, seed = 1428571, progress = TRUE)

Arguments

x

[data.frame | Required] Dataset on which EDA should be performed

removeFeats

[character vector | Optional] Character vector of features that should be excluded from EDA

missPercent

[numeric | Optional] A numeric values between 0-1 to calculate which features contain a majority of missing values. Default of 0.3

outlierMethod

[character | Optional] Options are tukey or percentile. Default of tukey

lowPercentile

[numeric | Optional] Values below this percentile value will be flagged as outliers. Default of 0.01

upPercentile

[numeric | Optional] Values above this percentile value will be flagged as outliers. Default of 0.99

minLevelPercentage

[numeric | optional] Used to identify low proportional categorical levels. Default of 0.025

minUnique

[integer | Optional] Used to identify feature classes, dictates between numeric and character features. Default of 25

minChrPercentage

[numeric | Optional] Used to identify incorrectly formatted numeric or integer features as character features. Default of 0.2

numChars

[integer | Optional] Used to identify text features, note that text features are not the same as character features. Text features containg multiple paragraphs of text. Default of 25

seed

[integer | Optional] Random seed number for reproducable results. Default of 1991

progress

[logical | Optional] Display a progress bar if TRUE

Value

data.frame object with summary statistics

Author(s)

Xander Horn

Examples

1

XanderHorn/autoML documentation built on Aug. 5, 2020, 11:45 a.m.