It shows basic statistics for each characteristic in a data frame. The report includes:
Field: Field name.
Type: Factor, numeric, integer, other.
Recs: Number of records.
Miss: Number of missing records.
Min: Minimum value.
Q25: First quartile. It splits off the lowest 25% of data from the highest 75%.
Q50: Median or second quartile. It cuts data set in half.
Avg: Average value.
Q75: Third quartile. It splits off the lowest 75% of data from the highest 25%.
Max: Maximum value.
StDv: Standard deviation of a sample.
Neg: Number of negative values.
Pos: Number of positive values.
OutLo: Number of outliers. Records below
OutHi: Number of outliers. Records above
A data frame.
Optional parameter to define the decimal points shown in the output table. Default is 3.
Optional parameter that turns on or off a progress bar. Default value is 1.
smbinning.eda generates two data frames that list each characteristic
with basic statistics such as extreme values and quartiles;
and also percentages of missing values and outliers, among others.
1 2 3 4 5 6 7 8 9
# Training and testing samples (Just some basic formality for Modeling) pop=chileancredit # Set population train=subset(pop,Rnd<=0.7) # Training sample test=subset(pop,Rnd>0.7) # Testing sample rm(chileancredit) # Remove original dataset # EDA application smbinning.eda(train,rounding=3)$eda # Table with basic statistics. smbinning.eda(train,rounding=3)$edapct # Table with basic percentages.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.