Description Usage Arguments Value Author(s) Examples
Performs exploratory data analysis and returns a summary data.frame containing useful information regarding features in the dataset
1 2 3 4 | exploreData(x, removeFeats = NULL, missPercent = 0.3,
outlierMethod = "tukey", lowPercentile = 0.01, upPercentile = 0.99,
minLevelPercentage = 0.025, minUnique = 25, minChrPercentage = 0.2,
numChars = 25, seed = 1428571, progress = TRUE)
|
x |
[data.frame | Required] Dataset on which EDA should be performed |
removeFeats |
[character vector | Optional] Character vector of features that should be excluded from EDA |
missPercent |
[numeric | Optional] A numeric values between 0-1 to calculate which features contain a majority of missing values. Default of 0.3 |
outlierMethod |
[character | Optional] Options are tukey or percentile. Default of tukey |
lowPercentile |
[numeric | Optional] Values below this percentile value will be flagged as outliers. Default of 0.01 |
upPercentile |
[numeric | Optional] Values above this percentile value will be flagged as outliers. Default of 0.99 |
minLevelPercentage |
[numeric | optional] Used to identify low proportional categorical levels. Default of 0.025 |
minUnique |
[integer | Optional] Used to identify feature classes, dictates between numeric and character features. Default of 25 |
minChrPercentage |
[numeric | Optional] Used to identify incorrectly formatted numeric or integer features as character features. Default of 0.2 |
numChars |
[integer | Optional] Used to identify text features, note that text features are not the same as character features. Text features containg multiple paragraphs of text. Default of 25 |
seed |
[integer | Optional] Random seed number for reproducable results. Default of 1991 |
progress |
[logical | Optional] Display a progress bar if TRUE |
data.frame object with summary statistics
Xander Horn
1 |
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.