pvs.logreg: P-Values to Classify New Observations (Penalized... In pvclass: P-Values for Classification

Description

Computes nonparametric p-values for the potential class memberships of new observations. The p-values are based on 'penalized logistic regression'.

Usage

 1 2 3 4 pvs.logreg(NewX, X, Y, tau.o = 10, find.tau=FALSE, delta=2, tau.max=80, tau.min=1, a0 = NULL, b0 = NULL, pen.method = c('vectors', 'simple', 'none'), progress = FALSE)

Arguments

 NewX data matrix consisting of one or several new observations (row vectors) to be classified. X matrix containing training observations, where each observation is a row vector. Y vector indicating the classes which the training observations belong to. tau.o the penalty parameter (see section 'Details' below). find.tau logical. If TRUE the program searches for the best tau. For more information see section 'Details'. delta factor for the penalty parameter. Should be greater than 1. Only needed if find.tau == TRUE. tau.max maximal penalty parameter considered. Only needed if find.tau == TRUE. tau.min minimal penalty parameter considered. Only needed if find.tau == TRUE. a0, b0 optional starting values for logistic regression. pen.method the method of penalization (see section 'Details' below). progress optional parameter for reporting the status of the computations.

Details

Computes nonparametric p-values for the potential class memberships of new observations. Precisely, for each new observation NewX[i,] and each class b the number PV[i,b] is a p-value for the null hypothesis that Y[i] equals b.
This p-value is based on a permutation test applied to an estimated Bayesian likelihood ratio, using 'penalized logistic regression'. This means, the conditional probability of Y = y, given X = x, is assumed to be proportional to exp(a_y + b_y^T x). The parameters a_y, b_y are estimated via penalized maximum log-likelihood. The penalization is either a weighted sum of the euclidean norms of the vectors (b_1[j],b_2[j],…,b_L[j]) (pen.method=='vectors') or a weighted sum of all moduli |b_{θ}[j]| (pen.method=='simple'). The weights are given by tau.o times the sample standard deviation (within groups) of the j-th components of the feature vectors. In case of pen.method=='none', no penalization is used, but this option may be unstable.
If find.tau == TRUE, the program searches for the best penalty parameter. To determine the best parameter tau for the p-value PV[i,b], the new observation NewX[i,] is added to the training data with class label b and then for all training observations with Y[j] != b the estimated probability of X[j,] belonging to class b is computed. Then the tau which minimizes the sum of these values is chosen. First, tau.o is compared with tau.o*delta. If tau.o*delta is better, it is compared with tau.o*delta^2, etc. The maximal parameter considered is tau.max. If tau.o is better than tau.o*delta, it is compared with tau.o*delta^-1, etc. The minimal parameter considered is tau.min.

Value

PV is a matrix containing the p-values. Precisely, for each new observation NewX[i,] and each class b the number PV[i,b] is a p-value for the null hypothesis that Y[i] = b.
If find.tau == TRUE, PV has an attribute "tau.opt", which is a matrix and tau.opt[i,b] is the best tau for observation NewX[i,] and class b (see section 'Details'). tau.opt[i,b] is used to compute the p-value for observation NewX[i,] and class b.

Author(s)

Niki Zumbrunnen niki.zumbrunnen@gmail.com
Lutz Dümbgen lutz.duembgen@stat.unibe.ch
www.imsv.unibe.ch/duembgen/index_ger.html

References

Zumbrunnen N. and Dümbgen L. (2017) pvclass: An R Package for p Values for Classification. Journal of Statistical Software 78(4), 1–19. doi:10.18637/jss.v078.i04

Dümbgen L., Igl B.-W. and Munk A. (2008) P-Values for Classification. Electronic Journal of Statistics 2, 468–493, available at http://dx.doi.org/10.1214/08-EJS245.

Zumbrunnen N. (2014) P-Values for Classification – Computational Aspects and Asymptotics. Ph.D. thesis, University of Bern, available at http://boris.unibe.ch/id/eprint/53585.