InitialStep: Initial step of the panning algorithm
In SMAC-Group/panning: An implementation of the Panning Algorithm

Description Usage Arguments Details Value Author(s) References See Also Examples

InitialStep computes the intial step of the Panning Algorithm.

1
2
3

InitialStep(y, X, d = 1L, alpha = 0.05, B = NULL, seed = 951L,
  m = 10L, K = 10L, family, type = NULL, divergence, W = NULL,
  proc = 1L, C0 = 0.5, increasing = FALSE, trace = TRUE, ...)

`y, X, m, K, family, type, divergence, C0, W, increasing, trace, ...`	(see function `CVmFold`)
`d`	the dimension of the model of interest (intercept is always included).
`alpha`	the level of the quantile of the prediction errors.
`B`	the number of bootstrap replicates.
`seed`	the seed for the random number generator.
`proc`	number of processor(s) for parallelisation.

This function computes exhaustively the m-fold Cross-validation (CV) prediction error for all the C(p,d) possible models of size d by calling the CVmFold function. If B=NULL (default), then B is set to be equal to C(p,d).

If B takes a positive integer value smaller than the total number of models C(p,d), then the function computes the CV prediction errors for B models of size d randomly selected. In this case, it is possible to set the seed for reproducibility.

At this stage, the algorithm does not allow for interaction terms among variables.

This function is computationnaly time consuming proportionally to the size of B.

InitialStep returns a list with the following components:

Ids: is the set I_d^* of indices of predictors with prediction errors cv.error<= q.alpha.
Sds: is the set S_d^* of models of size d with prediction errors cv.error<= q.alpha.
cv.error: is a (B x 1) vector of CV predictions errors.
q.alpha: is the empirical alpha-quantile computed on cv.error.
var.mat: is a (Bxd) matrix of indices of the explored models.

The indices returned by Ids are the column number of X as it is inputed, and not the name of the column. The indices are sorted by increasing number. Duplicates are deleted. Sds may contain duplicates.

Samuel Orso Samuel.Orso@unige.ch

Guerrier, S., Mili, N., Molinari, R., Orso, S., Avella-Medina, M. and Ma, Y. (2015) A Paradigmatic Regression Algorithm for Gene Selection Problems. submitted manuscript. http://arxiv.org/abs/1511.07662.

CVmFold, GeneralStep

## Not run: 
#####
# Simulate a logistic regression
n <- 50
set.seed(123)
beta <- c(1, rpois(40, lambda = 0.5))
p <- length(beta)
X <- matrix(rnorm((p-1)*n), nrow=n, ncol=(p-1))
y <- rbinom(n,1,1/(1+exp(-tcrossprod(beta, cbind(1, X)))))
#####

# (can take several seconds to run)
IStep <- InitialStep(y = y, X = X, family = binomial(link = "logit"), type = "response",
                     divergence = "classification", trace = FALSE)

# Run the parallelised version (4 cores)
IStep <- InitialStep(y = y, X = X, family = binomial(link = "logit"), type = "response",
                     divergence = "classification", proc = 2, trace = FALSE)

## End(Not run)