View source: R/01_UNIVARIATE_ANALYSIS.R
univariate | R Documentation |
univariate
returns the univariate statistics for risk factors supplied in data frame db
.
For numeric risk factors univariate report includes:
rf: Risk factor name.
rf.type: Risk factor class. This metric is always equal to numeric
.
bin.type: Bin type - special or complete cases.
bin: Bin type. If a sc.method
argument is equal to "together"
, then
bin
and bin.type
have the same value. If the sc.method
argument
is equal to "separately"
, then the bin
will contain all special cases that
exist for analyzed risk factor (e.g. NA
, NaN
, Inf
).
pct: Percentage of observations in each bin
.
cnt.unique: Number of unique values per bin
.
min: Minimum value.
p1, p5, p25, p50, p75, p95, p99: Percentile values.
avg: Mean value.
avg.se: Standard error of the mean.
max: Maximum value.
neg: Number of negative values.
pos: Number of positive values.
cnt.outliers: Number of outliers. Records above or below
Q75
\pm
1.5 * IQR
, where IQR = Q75 - Q25
.
sc.ind: Special case indicator. It takes value 1 if share of special cases exceeds
sc.threshold
otherwise 0.
For categorical risk factors univariate report includes:
rf: Risk factor name.
rf.type: Risk factor class. This metric is equal to one of: character
,
factor
or logical
.
bin.type: Bin type - special or complete cases.
bin: Bin type. If a sc.method
argument is equal to "together"
, then
bin
and bin.type
have the same value. If the sc.method
argument
is equal to "separately"
, then the bin
will contain all special cases that
exist for analyzed risk factor (e.g. NA
, NaN
, Inf
).
pct: Percentage of observations in each bin
.
cnt.unique: Number of unique values per bin
.
sc.ind: Special case indicator. It takes value 1 if share of special cases exceeds
sc.threshold
otherwise 0.
univariate(
db,
sc = c(NA, NaN, Inf, -Inf),
sc.method = "together",
sc.threshold = 0.2
)
db |
Data frame of risk factors supplied for univariate analysis. |
sc |
Vector of special case elements. Default values are |
sc.method |
Define how special cases will be treated, all together or in separate bins.
Possible values are |
sc.threshold |
Threshold for special cases expressed as percentage of total number of observations.
If |
The command univariate
returns the data frame with explained univariate metrics for numeric,
character, factor and logical class of risk factors.
suppressMessages(library(PDtoolkit))
data(gcd)
gcd$age[100:120] <- NA
gcd$age.bin <- ndr.bin(x = gcd$age, y = gcd$qual, y.type = "bina")[[2]]
gcd$age.bin <- as.factor(gcd$age.bin)
gcd$maturity.bin <- ndr.bin(x = gcd$maturity, y = gcd$qual, y.type = "bina")[[2]]
gcd$amount.bin <- ndr.bin(x = gcd$amount, y = gcd$qual, y.type = "bina")[[2]]
gcd$all.miss1 <- NaN
gcd$all.miss2 <- NA
gcd$tf <- sample(c(TRUE, FALSE), nrow(gcd), rep = TRUE)
#create date variable to confirm that it will not be processed by the function
gcd$dates <- Sys.Date()
str(gcd)
univariate(db = gcd)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.