npcs  R Documentation 
Fit a multiclass NeymanPearson classifier with error controls via costsensitive learning. This function implements two algorithms proposed in Tian, Y. & Feng, Y. (2021). The problem is minimize a linear combination of P(hat(Y)(X) != k Y=k) for some classes k while controlling P(hat(Y)(X) != k Y=k) for some classes k. See Tian, Y. & Feng, Y. (2021) for more details.
npcs(
x,
y,
algorithm = c("CX", "ER"),
classifier,
seed = 1,
w,
alpha,
trControl = list(),
tuneGrid = list(),
split.ratio = 0.5,
split.mode = c("byclass", "merged"),
tol = 1e06,
refit = TRUE,
protect = TRUE,
opt.alg = c("HookeJeeves", "NelderMead")
)
x 
the predictor matrix of training data, where each row and column represents an observation and predictor, respectively. 
y 
the response vector of training data. Must be integers from 1 to K for some K >= 2. Can be either a numerical or factor vector. 
algorithm 
the NPMC algorithm to use. String only. Can be either "CX" or "ER", which implements NPMCCX or NPMCER in Tian, Y. & Feng, Y. (2021). 
classifier 
which model to use for estimating the posterior distribution P(YX = x). String only. 
seed 
random seed 
w 
the weights in objective function. Should be a vector of length K, where K is the number of classes. 
alpha 
the levels we want to control for error rates of each class. Should be a vector of length K, where K is the number of classes. Use NA if if no error control is imposed for specific classes. 
trControl 
list; resampling method 
tuneGrid 
list; for hyperparameters tuning or setting 
split.ratio 
the proportion of data to be used in searching lambda (cost parameters). Should be between 0 and 1. Default = 0.5. Only useful when 
split.mode 
two different modes to split the data for NPMCER. String only. Can be either "perclass" or "merged". Default = "perclass". Only useful when

tol 
the convergence tolerance. Default = 1e06. Used in the lambdasearching step. The optimization is terminated when the step length of the main loop becomes smaller than 
refit 
whether to refit the classifier using all data after finding lambda or not. Boolean value. Default = TRUE. Only useful when 
protect 
whether to threshold the closezero lambda or not. Boolean value. Default = TRUE. This parameter is set to avoid extreme cases that some lambdas are set equal to zero due to computation accuracy limit. When 
opt.alg 
optimization method to use when searching lambdas. String only. Can be either "HookeJeeves" or "NelderMead". Default = "HookeJeeves". 
An object with S3 class "npcs"
.
lambda 
the estimated lambda vector, which consists of Lagrangian multipliers. It is related to the cost. See Section 2 of Tian, Y. & Feng, Y. (2021) for details. 
fit 
the fitted classifier. 
classifier 
which classifier to use for estimating the posterior distribution P(YX = x). 
algorithm 
the NPMC algorithm to use. 
alpha 
the levels we want to control for error rates of each class. 
w 
the weights in objective function. 
pik 
the estimated marginal probability for each class. 
Tian, Y., & Feng, Y. (2021). NeymanPearson Multiclass Classification via Costsensitive Learning. Submitted. Available soon on arXiv.
predict.npcs
, error_rate
, generate_data
, gamma_smote
.
# data generation: case 1 in Tian, Y., & Feng, Y. (2021) with n = 1000
set.seed(123, kind = "L'EcuyerCMRG")
train.set < generate_data(n = 1000, model.no = 1)
x < train.set$x
y < train.set$y
test.set < generate_data(n = 1000, model.no = 1)
x.test < test.set$x
y.test < test.set$y
# contruct the multiclass NP problem: case 1 in Tian, Y., & Feng, Y. (2021)
alpha < c(0.05, NA, 0.01)
w < c(0, 1, 0)
# try NPMCCX, NPMCER, and vanilla multinomial logistic regression
fit.vanilla < nnet::multinom(y~., data = data.frame(x = x, y = factor(y)), trace = FALSE)
fit.npmc.CX < try(npcs(x, y, algorithm = "CX", classifier = "multinom",
w = w, alpha = alpha))
fit.npmc.ER < try(npcs(x, y, algorithm = "ER", classifier = "multinom",
w = w, alpha = alpha, refit = TRUE))
# test error of vanilla multinomial logistic regression
y.pred.vanilla < predict(fit.vanilla, newdata = data.frame(x = x.test))
error_rate(y.pred.vanilla, y.test)
# test error of NPMCCX
y.pred.CX < predict(fit.npmc.CX, x.test)
error_rate(y.pred.CX, y.test)
# test error of NPMCER
y.pred.ER < predict(fit.npmc.ER, x.test)
error_rate(y.pred.ER, y.test)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.