Description Usage Arguments Value See Also Examples
modelEDA
performs standard exploratory data analysis on a
modeling dataset, including variable clustering, weight of evidence (binary y),
numeric R-squared (continuous y) and univariate summaries/graphs. Parameters
prefaced with a function dot (i.e. bin.
) are only applicable to the use
of that function.
1 2 3 4 5 6 7 8 9 | modelEDA(x, yname, ytype, bin.numBins = 10, bin.equalBinSize = FALSE,
bin.minPct = 0, bin.maxPct = 100, nrsq.na.rm = FALSE,
variable_cluster.n = max(floor((length(x) - 1)/10), 2),
variable_cluster.na.rm = TRUE, univariateSummary.FUN = mean,
univariateGraph.yLabel = yname, univariateGraph.yType = ifelse(ytype == 1,
"pct", "dlr"), univariateGraph.yDigits = ifelse(ytype == 1, 1, 0),
univariateGraph.yRangeMode = "tozero",
univariateGraph.barColor = "#BDDFF7",
univariateGraph.lineColor = "#000000")
|
x |
data frame; a modeling dataset |
yname |
character string; dependent variable column name |
ytype |
integer value of 1 (binary y) or 2 (continuous y) |
bin.numBins |
integer value >= 2; number of desired bins |
bin.equalBinSize |
logical value; return equally sized (TRUE) or equally spaced (FALSE) bins |
bin.minPct |
integer between 0 and 100 specifying a percentile to force as the max endpoint for the low (first) bin |
bin.maxPct |
integer between 0 and 100 specifying a percentile to force as the min endpoint for the high (last) bin (must be > minPct) |
nrsq.na.rm |
logical value indicating whether missing values of x (and their corresponding y values) should be removed |
variable_cluster.n |
integer value >= 2; number of desired clusters |
variable_cluster.na.rm |
logical value; should records with missing values be removed? |
univariateSummary.FUN |
function to be applied to y |
univariateGraph.yLabel |
character string; y variable label |
univariateGraph.yType |
character string; y variable format type; valid values are "int", "dlr" and "pct" |
univariateGraph.yDigits |
non-negative integer value indicating the number of decimal places to show for values of the y variable |
univariateGraph.yRangeMode |
character string; "tozero" (y-axis starts at 0) or "auto" (y-axis extremes determined by data) |
univariateGraph.barColor |
character string; fill color for bars (valid color) |
univariateGraph.lineColor |
character string; line color (valid color) |
A named list with class mt_modelEDA
containing the following
objects:
dataSummaryDT
: datatable()
object summarizing data
y.relativeHistogram
: relativeHistogram()
object for y
variable_cluster
: data frame with variable_cluster()
results
woe
(binary y
): named list of woe()
tables
infoValue
(binary y
): named list of infoValue()
values
woeDT
(binary y
): named list of woeDT()
objects
nrsq
(continuous y
): named list of nrsq()
values
variable_cluster_plus
: data frame with variable_cluster_plus()
results
clusterDT
: named list of clusterDT()
objects
univariateSummary
: named list of univariateSummary()
tables
univariateSummaryDT
: named list of univariateSummaryDT()
objects
univariateGraph
: named list of univariateGraph()
objects
woe, nrsq, variable_cluster, univariateGraph
1 2 3 4 5 6 7 8 9 |
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.