# npcs: Fit a multi-class Neyman-Pearson classifier with error... In npcs: Neyman-Pearson Classification via Cost-Sensitive Learning

 npcs R Documentation

## Fit a multi-class Neyman-Pearson classifier with error controls via cost-sensitive learning.

### Description

Fit a multi-class Neyman-Pearson classifier with error controls via cost-sensitive learning. This function implements two algorithms proposed in Tian, Y. & Feng, Y. (2021). The problem is minimize a linear combination of P(hat(Y)(X) != k| Y=k) for some classes k while controlling P(hat(Y)(X) != k| Y=k) for some classes k. See Tian, Y. & Feng, Y. (2021) for more details.

### Usage

``````npcs(
x,
y,
algorithm = c("CX", "ER"),
classifier,
seed = 1,
w,
alpha,
trControl = list(),
tuneGrid = list(),
split.ratio = 0.5,
split.mode = c("by-class", "merged"),
tol = 1e-06,
refit = TRUE,
protect = TRUE,
)
``````

### Arguments

 `x` the predictor matrix of training data, where each row and column represents an observation and predictor, respectively. `y` the response vector of training data. Must be integers from 1 to K for some K >= 2. Can be either a numerical or factor vector. `algorithm` the NPMC algorithm to use. String only. Can be either "CX" or "ER", which implements NPMC-CX or NPMC-ER in Tian, Y. & Feng, Y. (2021). `classifier` which model to use for estimating the posterior distribution P(Y|X = x). String only. `seed` random seed `w` the weights in objective function. Should be a vector of length K, where K is the number of classes. `alpha` the levels we want to control for error rates of each class. Should be a vector of length K, where K is the number of classes. Use NA if if no error control is imposed for specific classes. `trControl` list; resampling method `tuneGrid` list; for hyperparameters tuning or setting `split.ratio` the proportion of data to be used in searching lambda (cost parameters). Should be between 0 and 1. Default = 0.5. Only useful when `algorithm` = "ER". `split.mode` two different modes to split the data for NPMC-ER. String only. Can be either "per-class" or "merged". Default = "per-class". Only useful when `algorithm` = "ER". per-class: split the data by class. merged: split the data as a whole. `tol` the convergence tolerance. Default = 1e-06. Used in the lambda-searching step. The optimization is terminated when the step length of the main loop becomes smaller than `tol`. See pages of `hjkb` and `nmkb` for more details. `refit` whether to refit the classifier using all data after finding lambda or not. Boolean value. Default = TRUE. Only useful when `algorithm` = "ER". `protect` whether to threshold the close-zero lambda or not. Boolean value. Default = TRUE. This parameter is set to avoid extreme cases that some lambdas are set equal to zero due to computation accuracy limit. When `protect` = TRUE, all lambdas smaller than 1e-03 will be set equal to 1e-03. `opt.alg` optimization method to use when searching lambdas. String only. Can be either "Hooke-Jeeves" or "Nelder-Mead". Default = "Hooke-Jeeves".

### Value

An object with S3 class `"npcs"`.

 `lambda` the estimated lambda vector, which consists of Lagrangian multipliers. It is related to the cost. See Section 2 of Tian, Y. & Feng, Y. (2021) for details. `fit` the fitted classifier. `classifier` which classifier to use for estimating the posterior distribution P(Y|X = x). `algorithm` the NPMC algorithm to use. `alpha` the levels we want to control for error rates of each class. `w` the weights in objective function. `pik` the estimated marginal probability for each class.

### References

Tian, Y., & Feng, Y. (2021). Neyman-Pearson Multi-class Classification via Cost-sensitive Learning. Submitted. Available soon on arXiv.

`predict.npcs`, `error_rate`, `generate_data`, `gamma_smote`.

### Examples

``````# data generation: case 1 in Tian, Y., & Feng, Y. (2021) with n = 1000
set.seed(123, kind = "L'Ecuyer-CMRG")
train.set <- generate_data(n = 1000, model.no = 1)
x <- train.set\$x
y <- train.set\$y

test.set <- generate_data(n = 1000, model.no = 1)
x.test <- test.set\$x
y.test <- test.set\$y

# contruct the multi-class NP problem: case 1 in Tian, Y., & Feng, Y. (2021)
alpha <- c(0.05, NA, 0.01)
w <- c(0, 1, 0)

# try NPMC-CX, NPMC-ER, and vanilla multinomial logistic regression
fit.vanilla <- nnet::multinom(y~., data = data.frame(x = x, y = factor(y)), trace = FALSE)
fit.npmc.CX <- try(npcs(x, y, algorithm = "CX", classifier = "multinom",
w = w, alpha = alpha))
fit.npmc.ER <- try(npcs(x, y, algorithm = "ER", classifier = "multinom",
w = w, alpha = alpha, refit = TRUE))
# test error of vanilla multinomial logistic regression
y.pred.vanilla <- predict(fit.vanilla, newdata = data.frame(x = x.test))
error_rate(y.pred.vanilla, y.test)
# test error of NPMC-CX
y.pred.CX <- predict(fit.npmc.CX, x.test)
error_rate(y.pred.CX, y.test)
# test error of NPMC-ER
y.pred.ER <- predict(fit.npmc.ER, x.test)
error_rate(y.pred.ER, y.test)

``````

npcs documentation built on April 27, 2023, 9:10 a.m.