visualizeR: visualizeR - Automated exploratory data analysis for...
In XanderHorn/visualizeR:

Description Usage Arguments Author(s) Examples

visualizeR automates exploratory data analysis for classification problems in machine learning. The problem can be two-class or multi-class classification. It is recommended that all ID and Date features be removed before running this algorithm, cleaning the data before running this is also recommended. visualizeR has some data cleaning aspects built into it but cannot account for domain knowledge cleaning.

visualizeR(df, Outcome,
           nrBins = 30,
           sample = 0.3,
           clipOutliers = TRUE,
           handleMissing = TRUE,
           CatChartType = "stackedHist",
           NumChartType = "boxPlot",
           summaryStats = FALSE,
           seed = 1234,
           maxLevels = 25,
           nrUniques = 20,
           outputPath = "",
           outputFileName = "outputPlots")

`df`	A data.frame object containing plotting features and target/outcome feature. Cannot be left blank.
`Outcome`	The feature name of the outcome as character format, e.g. 'Target'. Cannot be left blank.
`nrBins`	The number of bins to use in histogram plots of numerical features should 'stackedHist' be used as the chart type in the parameter 'NumChartType'.
`sample`	Should a random sample be taken in order to speed the plotting process up.
`clipOutliers`	Should outliers be fixed in the data using a median approach. Possible values: TRUE,FALSE
`handleMissing`	Should missing values be corrected with 'Missing' value for categorical variables and median imputation for conitnuous variables. Possible values: TRUE,FALSE. Should this be left as FALSE then missing observations will be removed from the plots.
`CatChartType`	Indicates the type of chart to use when plotting categorical/factor features. Possible values: 'stackedHist', 'Confusion'
`NumChartType`	Indicates the type of chart to use when plotting numerical/continuous features. Possible values: 'stackedHist', 'densityLine', 'densityFill', 'boxPlot'
`summaryStats`	Should summary statistics be printed for predictors in the dataset, summary stats for continuous and frequency tables for categorical variables. Possible values: TRUE,FALSE
`seed`	Used only for the sampling of the data and to reproduce the plots.
`maxLevels`	The maximum levels allowed for factor features, if a feature has levels more than the threshold it will not be plotted.
`nrUniques`	The number of allowed unique values for a feature before it is automatically changed to a categorical feature. If a feature has less than this threshold, the feature will be changed to a categorical feature.
`outputFileName`	The name of the file containing all the plots.
`ouputPath`	A file path where the plots should be saved in a PDF document. If left blank all plots will be displayed in R.

Xander Horn

EXAMPLE 1:
library(datasets)
train <- data.frame(iris)
visualizeR(df = train,
          Outcome = 'Species',
          nrBins = 30,
          sample = 1,
          clipOutliers = 'Y',
          CatChartType = 'stackedHist',
          NumChartType = 'boxPlot')
          
EXAMPLE 2:
visualizeR(df = train,
Outcome = 'Species',
nrBins = 30,
sample = 1,
clipOutliers = 'Y',
CatChartType = 'Confusion',
NumChartType = 'stackedHist',
summaryStats = 'Y',
outputPath = 'C:/Users/User/Documents',
outputFileName = 'IrisExploratoryDataAnalysis')