exploreData: Exploratory data analysis
In XanderHorn/autoML: Automated machine learning

Description Usage Arguments Value Author(s) Examples

Performs exploratory data analysis and returns a summary data.frame containing useful information regarding features in the dataset

exploreData(x, removeFeats = NULL, missPercent = 0.3,
  outlierMethod = "tukey", lowPercentile = 0.01, upPercentile = 0.99,
  minLevelPercentage = 0.025, minUnique = 25, minChrPercentage = 0.2,
  numChars = 25, seed = 1428571, progress = TRUE)

`x`	[data.frame \| Required] Dataset on which EDA should be performed
`removeFeats`	[character vector \| Optional] Character vector of features that should be excluded from EDA
`missPercent`	[numeric \| Optional] A numeric values between 0-1 to calculate which features contain a majority of missing values. Default of 0.3
`outlierMethod`	[character \| Optional] Options are tukey or percentile. Default of tukey
`lowPercentile`	[numeric \| Optional] Values below this percentile value will be flagged as outliers. Default of 0.01
`upPercentile`	[numeric \| Optional] Values above this percentile value will be flagged as outliers. Default of 0.99
`minLevelPercentage`	[numeric \| optional] Used to identify low proportional categorical levels. Default of 0.025
`minUnique`	[integer \| Optional] Used to identify feature classes, dictates between numeric and character features. Default of 25
`minChrPercentage`	[numeric \| Optional] Used to identify incorrectly formatted numeric or integer features as character features. Default of 0.2
`numChars`	[integer \| Optional] Used to identify text features, note that text features are not the same as character features. Text features containg multiple paragraphs of text. Default of 25
`seed`	[integer \| Optional] Random seed number for reproducable results. Default of 1991
`progress`	[logical \| Optional] Display a progress bar if TRUE