visualizeR
is a library for R to attempt to make exploratory data analysis for classification problems in the field of machine learning automatic.
~~~~~~~~~~~~~~~~~ PLEASE NOTE THAT visualizeR IS IN BETA PHASE ~~~~~~~~~~~~~~~~~
It supports two-class and multi-class classification problems. Some data cleaning needs to be performed prior to runnng visualizeR, it is also recommended that all ID and Date features are removed from the dataset.
visualizeR
automatically identifies categorical and numerical features in your data. It just requires your data.frame name and the outcome/target feature you are trying to predict as a character format e.g. "TARGET"
visualizeR
has some "simplistic" data cleaning built into it, for example, if a feature is encoded as a factor feature but has more than 20 levels, visualizeR will not plot that feature, as it will be un-readable. If a feature is encoded as a numeric but has 20 or less unique values, visualizeR automatically changes that feature to a factor(categorical) feature.
visualizeR
can also impute and encode missing values for both numeric (continuous) and factor (categorical) features, by using a median replacement approach for numeric features and uses the replacement value of "Missing" for categorical features.
visualizeR
can also output all the plots to a .PDF file, when using the parameter 'outputPath' be sure to check the slashes ("/","\") that your system uses to ensure an error free experience.
For full parameter details please run the following command after you have installed and loaded visualizeR
. '?visualizeR'
visualizeR
utilizes the package pacman
, which manages all packages in R. Install pacman
with the installation of visualizeR
and everything package wise will be sorted.
To install visualizeR
simply use the code below:
install.packages('devtools')
install.packages('pacman')
library(devtools)
devtools::install_github("XanderHorn/visualizeR")
library(visualizeR)
df
: A data.frame object containing plotting features and target/outcome feature. Cannot be left blank.
Outcome
: The feature name of the outcome as character format, e.g. 'Target'. Cannot be left blank.
nrBins
: The number of bins to use in histogram plots of numerical features should 'stackedHist' be used as the chart type in the parameter: 'NumChartType'.
sample
: Should a random sample be taken in order to speed the plotting process up.
clipOutliers
: Should outliers be fixed in the data using a median approach. Possible values: TRUE,FALSE
handleMissing
: Should missing values be corrected with 'Missing' value for categorical variables and median imputation for conitnuous variables. Possible values: TRUE,FALSE. Should this be left as 'N' then missing observations will be removed from the plots.
CatChartType
: Indicates the type of chart to use when plotting categorical/factor features. Possible values: 'stackedHist', 'Confusion'
NumChartType
: Indicates the type of chart to use when plotting numerical/continuous features. Possible values: 'stackedHist', 'densityLine', 'densityFill', 'boxPlot'
summaryStats
: Should summary statistics be printed for predictors in the dataset, summary stats for continuous and frequency tables for categorical variables. Possible values: TRUE,FALSE
seed
: Used only for the sampling of the data and to reproduce the plots.
maxLevels
: The maximum allowed levels for factor features. If this threshold is exceeded the feature will not be plotted. Recommended to limit this as it will make plots hard to read.
nrUniques
: The number of allowed unique values for a feature before it is automatically changed to a categorical feature. If a feature has less than this threshold, the feature will be changed to a categorical feature.
ouputPath
: A file path where the plots should be saved in a PDF document. If left blank all plots will be displayed in R.
outputFileName
: The name of the file containing all the plots.
maxLevels
to indicate when features should not be plotted.nrUniques
parameter to indicate when features should be seen as numeric or categorical.Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.