npcs: Fit a multi-class Neyman-Pearson classifier with error...
In npcs: Neyman-Pearson Classification via Cost-Sensitive Learning

npcs	R Documentation

Fit a multi-class Neyman-Pearson classifier with error controls via cost-sensitive learning.

Description

Fit a multi-class Neyman-Pearson classifier with error controls via cost-sensitive learning. This function implements two algorithms proposed in Tian, Y. & Feng, Y. (2021). The problem is minimize a linear combination of P(hat(Y)(X) != k| Y=k) for some classes k while controlling P(hat(Y)(X) != k| Y=k) for some classes k. See Tian, Y. & Feng, Y. (2021) for more details.

Usage

npcs(
  x,
  y,
  algorithm = c("CX", "ER"),
  classifier,
  seed = 1,
  w,
  alpha,
  trControl = list(),
  tuneGrid = list(),
  split.ratio = 0.5,
  split.mode = c("by-class", "merged"),
  tol = 1e-06,
  refit = TRUE,
  protect = TRUE,
  opt.alg = c("Hooke-Jeeves", "Nelder-Mead")
)

Arguments

`x`	the predictor matrix of training data, where each row and column represents an observation and predictor, respectively.
`y`	the response vector of training data. Must be integers from 1 to K for some K >= 2. Can be either a numerical or factor vector.
`algorithm`	the NPMC algorithm to use. String only. Can be either "CX" or "ER", which implements NPMC-CX or NPMC-ER in Tian, Y. & Feng, Y. (2021).
`classifier`	which model to use for estimating the posterior distribution P(Y\|X = x). String only.
`seed`	random seed
`w`	the weights in objective function. Should be a vector of length K, where K is the number of classes.
`alpha`	the levels we want to control for error rates of each class. Should be a vector of length K, where K is the number of classes. Use NA if if no error control is imposed for specific classes.
`trControl`	list; resampling method
`tuneGrid`	list; for hyperparameters tuning or setting
`split.ratio`	the proportion of data to be used in searching lambda (cost parameters). Should be between 0 and 1. Default = 0.5. Only useful when `algorithm` = "ER".
`split.mode`	two different modes to split the data for NPMC-ER. String only. Can be either "per-class" or "merged". Default = "per-class". Only useful when `algorithm` = "ER". per-class: split the data by class. merged: split the data as a whole.
`tol`	the convergence tolerance. Default = 1e-06. Used in the lambda-searching step. The optimization is terminated when the step length of the main loop becomes smaller than `tol`. See pages of `hjkb` and `nmkb` for more details.
`refit`	whether to refit the classifier using all data after finding lambda or not. Boolean value. Default = TRUE. Only useful when `algorithm` = "ER".
`protect`	whether to threshold the close-zero lambda or not. Boolean value. Default = TRUE. This parameter is set to avoid extreme cases that some lambdas are set equal to zero due to computation accuracy limit. When `protect` = TRUE, all lambdas smaller than 1e-03 will be set equal to 1e-03.
`opt.alg`	optimization method to use when searching lambdas. String only. Can be either "Hooke-Jeeves" or "Nelder-Mead". Default = "Hooke-Jeeves".

Value

An object with S3 class "npcs".

`lambda`	the estimated lambda vector, which consists of Lagrangian multipliers. It is related to the cost. See Section 2 of Tian, Y. & Feng, Y. (2021) for details.
`fit`	the fitted classifier.
`classifier`	which classifier to use for estimating the posterior distribution P(Y\|X = x).
`algorithm`	the NPMC algorithm to use.
`alpha`	the levels we want to control for error rates of each class.
`w`	the weights in objective function.
`pik`	the estimated marginal probability for each class.

References

Tian, Y., & Feng, Y. (2021). Neyman-Pearson Multi-class Classification via Cost-sensitive Learning. Submitted. Available soon on arXiv.

Examples

# data generation: case 1 in Tian, Y., & Feng, Y. (2021) with n = 1000
set.seed(123, kind = "L'Ecuyer-CMRG")
train.set <- generate_data(n = 1000, model.no = 1)
x <- train.set$x
y <- train.set$y

test.set <- generate_data(n = 1000, model.no = 1)
x.test <- test.set$x
y.test <- test.set$y

# contruct the multi-class NP problem: case 1 in Tian, Y., & Feng, Y. (2021)
alpha <- c(0.05, NA, 0.01)
w <- c(0, 1, 0)

# try NPMC-CX, NPMC-ER, and vanilla multinomial logistic regression
fit.vanilla <- nnet::multinom(y~., data = data.frame(x = x, y = factor(y)), trace = FALSE)
fit.npmc.CX <- try(npcs(x, y, algorithm = "CX", classifier = "multinom", 
w = w, alpha = alpha))
fit.npmc.ER <- try(npcs(x, y, algorithm = "ER", classifier = "multinom", 
w = w, alpha = alpha, refit = TRUE))
# test error of vanilla multinomial logistic regression
y.pred.vanilla <- predict(fit.vanilla, newdata = data.frame(x = x.test))
error_rate(y.pred.vanilla, y.test)
# test error of NPMC-CX
y.pred.CX <- predict(fit.npmc.CX, x.test)
error_rate(y.pred.CX, y.test)
# test error of NPMC-ER
y.pred.ER <- predict(fit.npmc.ER, x.test)
error_rate(y.pred.ER, y.test)

npcs documentation built on April 27, 2023, 9:10 a.m.