imp.outliers: Imputation methods for outliers
In PDtoolkit: Collection of Tools for PD Rating Model Development and Validation

imp.outliers

R Documentation

Imputation methods for outliers

Description

imp.outliers replaces predefined quantum of the smallest and largest values by the less extreme values. This procedure is applicable only to the numeric risk factors.

Usage

imp.outliers(
  db,
  sc = c(NA, NaN, Inf, -Inf),
  method = "iqr",
  range = 1.5,
  upper.pct = 0.95,
  lower.pct = 0.05
)

Arguments

`db`	Data frame of risk factors supplied for imputation.
`sc`	Vector of all special case elements. Default values are `c(NA, NaN, Inf)`. Those values will be excluded from calculation of imputed value and replacements.
`method`	Imputation method. Available options are: `"iqr"` and `"percentile"`. Method `iqr` performs identification of outliers by the method applied in boxplot 5-figures, while for `percentile` method user defines lower and upper limits for replacement. Default value is `"iqr"`.
`range`	Determines how far the plot whiskers extend out from the box. If range is positive, the whiskers extend to the most extreme data point which is no more than range times the interquartile range from the box. A value of zero causes the whiskers to extend to the data extremes. Default `range` is set to is 1.5.
`upper.pct`	Upper limit for percentile method. All values above this limit will be replaced by the value identified at this percentile. Default value is set to `95^{th}` percentile (0.95). This parameter is used only if selected `method` is `percentile`.
`lower.pct`	Lower limit for percentile method. All values below this limit will be replaced by the value identified at this percentile. Default value is set to `5^{th}` percentile (0.05). This parameter is used only if selected `method` is `percentile`.

Value

This function returns list of two data frames. The first data frame contains analyzed risk factors with imputed values for outliers, while the second data frame presents the imputation report. Using the imputation report, for each risk factor, user can inspect imputed info (info), imputation method (imputation.method), imputed value (imputation.val.upper and imputation.val.lower), number of imputed observations (imputation.num.upper and imputation.num.lower).

Examples

suppressMessages(library(PDtoolkit))
data(gcd)
gcd$age[1:20] <- NA
gcd$age.bin <- ndr.bin(x = gcd$age, y = gcd$qual, sc.method = "separately", y.type = "bina")[[2]]
gcd$dummy1 <- NA
imput.res.1 <- imp.outliers(db = gcd[, -1], 
		      method = "iqr",
		      range = 1.5)
#analyzed risk factors with imputed values
head(imput.res.1[[1]])
#imputation report
imput.res.1[[2]]
#percentile method
imput.res.2 <- imp.outliers(db = gcd[, -1], 
		      method = "percentile",
		      upper.pct = 0.95,
		      lower.pct = 0.05)
#analyzed risk factors with imputed values
head(imput.res.2[[1]])
#imputation report
imput.res.2[[2]]

PDtoolkit documentation built on Sept. 20, 2023, 9:06 a.m.