cv.PO.EN: Cross-validation function of PO-EN model
In PO.EN: An Elastic-Net Regularized Presence-Only Model

Description Usage Arguments Details Value Examples

View source: R/cross-validation.R

Does k-fold cross-validation for PO-EN, produces a pair values of lambda and the prevalence parameter for an optimal fitting.

1
2
3

cv.PO.EN(X, Y, alpha=0.5, o.iter=5, i.iter=20,
epsilon=1e-4,nfolds=10,type.measure='deviance',
depth=100,input.pi=0.5,a=sqrt(0.5),seed=1)

`X`	Input design matrix. Should not include the intercept vector.
`Y`	Response variable. Should be a binary vector.
`alpha`	The elastic net mixing parameter, with 0≤`alpha`≤ 1.
`o.iter`	Number of outer loop iteration.
`i.iter`	Number of inner loop iteration.
`epsilon`	The threshold for stopping the coordinate descent algorithm.
`nfolds`	The number of folds for applying cross validation. The default setting is 10. The number of presence observations must be a multiple of `nfolds`.
`type.measure`	The loss function to use for tuning lambda. The default is `type.measure='deviance'`. Other choices include AUROC (`type.measure='auc'`) and F measure (`type.measure='F.measure'`).
`depth`	The ratio between the largest lambda and the smallest lambda of the candidate sequence of lambda.
`input.pi`	The user-supplied prevalence sequence.
`a`	The parameter of F measure for tuning the true prevalence, the default value is √{0.5}.
`seed`	A single value used for random number generation of the functions.

The cross-validation function runs a n-folds cross-validation for selecting an optimal pair of lambda and the prevalence parameter. The default setting is 10-folds cross validation. The candidate sequence of lambda is automatically generated by the function based on a warm start. The values of input.pi should be supplied by users.


`lambda.min`	value of lambda that returns the minimum (or maximum,
	depending on `type.measure`) of mean cross-validated error.

`lambda.1se`	largest value of lambda such that error is within 1 standard error of the minimum.

`pi`	value of the prevalence parameter that returns maximum F measure.

data(example.data) # example datasets, including training dataset and testing dataset
train_data<-example.data$train.data
y_train=train_data$response;x_train=train_data[,-1]  # response and design matrix of training data
PO.EN.cv<-cv.PO.EN(x_train,y_train,input.pi=seq(0.01,0.4,length.out=4))

PO.EN.beta<-PO.EN(x_train,y_train,lambda=PO.EN.cv$lambda.min,
           true.prob=PO.EN.cv$pi,beta_start=rep(0,ncol(x_train)+1))