groupfs: Select a model with forward stepwise.

This function implements forward selection of linear models almost identically to step with direction = "forward". The reason this is a separate function from fs is that groups of variables (e.g. dummies encoding levels of a categorical variable) must be handled differently in the selective inference framework.


groupfs(x, y, index, maxsteps, sigma = NULL, k = 2, intercept = TRUE,
  center = TRUE, normalize = TRUE, aicstop = 0, verbose = FALSE)



Matrix of predictors (n by p).


Vector of outcomes (length n).


Group membership indicator of length p. Check that sort(unique(index)) = 1:G where G is the number of distinct groups.


Maximum number of steps for forward stepwise.


Estimate of error standard deviation for use in AIC criterion. This determines the relative scale between RSS and the degrees of freedom penalty. Default is NULL corresponding to unknown sigma. When NULL, link{groupfsInf} performs truncated F inference instead of truncated χ. See extractAIC for details on the AIC criterion.


Multiplier of model size penalty, the default is k = 2 for AIC. Use k = log(n) for BIC, or k = 2log(p) for RIC (best for high dimensions, when p > n). If G < p then RIC may be too restrictive and it would be better to use log(G) < k < 2log(p).


Should an intercept be included in the model? Default is TRUE. Does not count as a step.


Should the columns of the design matrix be centered? Default is TRUE.


Should the design matrix be normalized? Default is TRUE.


Early stopping if AIC increases. Default is 0 corresponding to no early stopping. Positive integer values specify the number of times the AIC is allowed to increase in a row, e.g. with aicstop = 2 the algorithm will stop if the AIC criterion increases for 2 steps in a row. The default of step corresponds to aicstop = 1.


Print out progress along the way? Default is FALSE.


An object of class "groupfs" containing information about the sequence of models in the forward stepwise algorithm. Call the function groupfsInf on this object to compute selective p-values.

See Also

groupfsInf, factorDesign.


x = matrix(rnorm(20*40), nrow=20)
index = sort(rep(1:20, 2))
y = rnorm(20) + 2 * x[,1] - x[,4]
fit = groupfs(x, y, index, maxsteps = 5)
out = groupfsInf(fit)

Step 1/5: computing P-value for group 1 
Step 2/5: computing P-value for group 2 
Step 3/5: computing P-value for group 10 
Step 4/5: computing P-value for group 19 
Step 5/5: computing P-value for group 6 
  Group Pvalue     TF df   Size Ints    Min    Max
1     1  0.228 38.087  2 21.622    1 27.772 49.394
2     2  0.141  7.528  2  3.122    1  5.214  8.336
3    10  0.949  1.089  2  6.268    1  1.026  7.294
4    19  0.093  2.385  2  0.874    1  1.624  2.498
5     6  0.264  1.380  2  0.750    1  0.884  1.635

Ints is the number of intervals in the truncated chi selection region and Size is the sum of their lengths. Min and Max are the lowest and highest endpoints of the truncation region. No confidence intervals are reported by groupfsInf.

