View source: R/01_UNIVARIATE_ANALYSIS.R
| univariate | R Documentation | 
univariate returns the univariate statistics for risk factors supplied in data frame db. 
For numeric risk factors univariate report includes:
rf: Risk factor name.
 rf.type: Risk factor class. This metric is always equal to numeric.
bin.type: Bin type - special or complete cases.
 bin: Bin type. If a sc.method argument is equal to "together", then
bin and bin.type have the same value. If the sc.method argument
is equal to "separately", then the bin will contain all special cases that
exist for analyzed risk factor (e.g. NA, NaN, Inf).
 pct: Percentage of observations in each bin.
 cnt.unique: Number of unique values per bin.
min: Minimum value.
p1, p5, p25, p50, p75, p95, p99: Percentile values.
avg: Mean value.
avg.se: Standard error of the mean.
max: Maximum value.
neg: Number of negative values.
pos: Number of positive values.
 cnt.outliers: Number of outliers. Records above or below
Q75\pm1.5 * IQR, where IQR = Q75 - Q25.
 sc.ind: Special case indicator. It takes value 1 if share of special cases exceeds
sc.threshold otherwise 0.
For categorical risk factors univariate report includes:
rf: Risk factor name.
 rf.type: Risk factor class. This metric is equal to one of: character,
factor or logical.
bin.type: Bin type - special or complete cases.
 bin: Bin type. If a sc.method argument is equal to "together", then
bin and bin.type have the same value. If the sc.method argument
is equal to "separately", then the bin will contain all special cases that
exist for analyzed risk factor (e.g. NA, NaN, Inf).
 pct: Percentage of observations in each bin.
 cnt.unique: Number of unique values per bin.
 sc.ind: Special case indicator. It takes value 1 if share of special cases exceeds
sc.threshold otherwise 0.
univariate(
  db,
  sc = c(NA, NaN, Inf, -Inf),
  sc.method = "together",
  sc.threshold = 0.2
)
| db | Data frame of risk factors supplied for univariate analysis. | 
| sc | Vector of special case elements. Default values are  | 
| sc.method | Define how special cases will be treated, all together or in separate bins.
Possible values are  | 
| sc.threshold | Threshold for special cases expressed as percentage of total number of observations.
If  | 
The command univariate returns the data frame with explained univariate metrics for numeric,
character, factor and logical class of risk factors.
suppressMessages(library(PDtoolkit))
data(gcd)
gcd$age[100:120] <- NA
gcd$age.bin <- ndr.bin(x = gcd$age, y = gcd$qual, y.type = "bina")[[2]]
gcd$age.bin <- as.factor(gcd$age.bin)
gcd$maturity.bin <- ndr.bin(x = gcd$maturity, y = gcd$qual, y.type = "bina")[[2]]
gcd$amount.bin <- ndr.bin(x = gcd$amount, y = gcd$qual, y.type = "bina")[[2]]
gcd$all.miss1 <- NaN
gcd$all.miss2 <- NA
gcd$tf <- sample(c(TRUE, FALSE), nrow(gcd), rep = TRUE)
#create date variable to confirm that it will not be processed by the function
gcd$dates <- Sys.Date()
str(gcd)
univariate(db = gcd)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.