Description Usage Arguments Value Author(s) Examples
Performs automated tabular exploratory data analysis. Summary statistics per feature is also calculated along with common data issues which will be flagged. Imputation values are also calculated per feature.
1 2 | dataOverview(x, outlierMethod = "tukey", lowPercentile = 0.01,
upPercentile = 0.99, minLevelPercentage = 0.025)
|
x |
[data.frame | Required] Dataset which should contain all relevant features. If x is not a data.frame object it will be converted to one. |
outlierMethod |
[character | Optional] Determines how outliers are identified. Two possible methods are available, tukey and percentile. When specifying percentile based outlier detection, it is recommended to manually set the lower and upper percentile values for detection. Defaults to tukey. |
lowPercentile |
[numeric | Optional] The lower percentile value that will be used to flag any values less than the calculated percentile as lower outliers. Recommended to set values between 0.01 and 0.05. Defaults to 0.01. |
upPercentile |
[numeric | Optional] The upper percentile value that will be used to flag any values greater than the calculated percentile as upper outliers. Recommended to set values between 0.95 and 0.99. Defaults to 0.99. |
minLevelPercentage |
[numeric | Optional] The minimum percentage data representation per level required for a categorical feature. Categorical features should ideally exhibit levels which contains adequate data proportions and levels with low proportions should require data cleaning. If a categorical feature has levels lower than the specified percentage, these levels will be used to determine the imputation value used. If the cumulative sum of the minimum levels are less than the specified minimum level, the imputation value is simply the mode of the feature, else all minimum levels are combined into a new level called ALL_OTHER. Defaults to 0.025. |
Object of type data.frame containing exploratory information of all features passed on in x.
Xander Horn
1 2 3 4 5 6 7 8 9 10 11 | # Tukey outlier detection example:
overview <- dataOverview(x = iris,
outlierMethod = "tukey",
minLevelPercentage = 0.025)
# Percentile outlier detection example:
overview <- dataOverview(x = iris,
outlierMethod = "percentile",
lowPercentile = 0.025,
upPercentile = 0.975,
minLevelPercentage = 0.025)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.