modelEDA: Exploratory data analysis on a modeling dataset

Description Usage Arguments Value See Also Examples

View source: R/modelEDA.R

Description

modelEDA performs standard exploratory data analysis on a modeling dataset, including variable clustering, weight of evidence (binary y), numeric R-squared (continuous y) and univariate summaries/graphs. Parameters prefaced with a function dot (i.e. bin.) are only applicable to the use of that function.

Usage

1
2
3
4
5
6
7
8
9
modelEDA(x, yname, ytype, bin.numBins = 10, bin.equalBinSize = FALSE,
  bin.minPct = 0, bin.maxPct = 100, nrsq.na.rm = FALSE,
  variable_cluster.n = max(floor((length(x) - 1)/10), 2),
  variable_cluster.na.rm = TRUE, univariateSummary.FUN = mean,
  univariateGraph.yLabel = yname, univariateGraph.yType = ifelse(ytype == 1,
  "pct", "dlr"), univariateGraph.yDigits = ifelse(ytype == 1, 1, 0),
  univariateGraph.yRangeMode = "tozero",
  univariateGraph.barColor = "#BDDFF7",
  univariateGraph.lineColor = "#000000")

Arguments

x

data frame; a modeling dataset

yname

character string; dependent variable column name

ytype

integer value of 1 (binary y) or 2 (continuous y)

bin.numBins

integer value >= 2; number of desired bins

bin.equalBinSize

logical value; return equally sized (TRUE) or equally spaced (FALSE) bins

bin.minPct

integer between 0 and 100 specifying a percentile to force as the max endpoint for the low (first) bin

bin.maxPct

integer between 0 and 100 specifying a percentile to force as the min endpoint for the high (last) bin (must be > minPct)

nrsq.na.rm

logical value indicating whether missing values of x (and their corresponding y values) should be removed

variable_cluster.n

integer value >= 2; number of desired clusters

variable_cluster.na.rm

logical value; should records with missing values be removed?

univariateSummary.FUN

function to be applied to y

univariateGraph.yLabel

character string; y variable label

univariateGraph.yType

character string; y variable format type; valid values are "int", "dlr" and "pct"

univariateGraph.yDigits

non-negative integer value indicating the number of decimal places to show for values of the y variable

univariateGraph.yRangeMode

character string; "tozero" (y-axis starts at 0) or "auto" (y-axis extremes determined by data)

univariateGraph.barColor

character string; fill color for bars (valid color)

univariateGraph.lineColor

character string; line color (valid color)

Value

A named list with class mt_modelEDA containing the following objects:

See Also

woe, nrsq, variable_cluster, univariateGraph

Examples

1
2
3
4
5
6
7
8
9
# binary y 
x <- modelEDA(mtcars, "vs", 1)
names(x)
x$woeDT

# continuous y
x <- modelEDA(mtcars, "mpg", 2)
names(x)
x$clusterDT

dnegrey/miscTools documentation built on May 3, 2019, 2:57 p.m.