ModelSelection.Phase: Construct sets of well-fitting models as proposed by Cox, D....

View source: R/ModelSelection.Phase.R

ModelSelection.PhaseR Documentation

Construct sets of well-fitting models as proposed by Cox, D. R. & Battey, H. S. (2017)

Description

This function tests low dimensional subsests of the set of retained variables from the reduction phase and any squared or interaction terms suggested at the exploratory phase. Lists of well-fitting models of each dimension are returned.

Usage

ModelSelection.Phase(X,Y, list.reduction, family=gaussian,
                      signif=0.01, sq.terms=NULL, in.terms=NULL,
                      modelSize=NULL, Cox.Hazard = FALSE)

Arguments

X

Design matrix.

Y

Response vector.

list.reduction

Indices of variables that where chosen at the reduction phase.

family

A description of the error distribution and link function to be used in the model. For glm this can be a character string naming a family function, a family function or the result of a call to a family function. See family for more details.

signif

Significance level of the likelihood ratio test against the comprehensive model. The default is 0.01.

sq.terms

Indices of squared terms suggested at the exploratory phase (See Exploratory.Phase).

in.terms

Indices of pairs of variables suggested at the exploratory phase (See Exploratory.Phase).

modelSize

Maximum size of the models to be tested. Curently the maximum is 7. If not provided a default is used.

Cox.Hazard

If TRUE fits proportional hazards regression model. The family argument will be ignored if Cox.Hazard=TRUE.

Value

goodModels

List of models that are in the confidence set of size 1 to modelSize. An interaction term between, say, variables x_1 and x_2 is displayed as “x_1 * x_2”; a squared term in, say, variable x_1 is displayed as “x_1 ^2”. If an interaction term is present without the corresponding main effects, the main effects should be added.

Acknowledgement

The work was supported by the UK Engineering and Physical Sciences Research Council under grant number EP/P002757/1.

Author(s)

Hoeltgebaum, H. H.

References

Cox, D. R. and Battey, H. S. (2017). Large numbers of explanatory variables, a semi-descriptive analysis. Proceedings of the National Academy of Sciences, 114(32), 8592-8595.

Battey, H. S. and Cox, D. R. (2018). Large numbers of explanatory variables: a probabilistic assessment. Proceedings of the Royal Society of London, A., 474(2215), 20170631.

Hoeltgebaum, H., & Battey, H. S. (2019). HCmodelSets: An R Package for Specifying Sets of Well-fitting Models in High Dimensions. The R Journal, 11(2), 370-379.

See Also

Reduction.Phase, Exploratory.Phase

Examples


## Generates a random DGP
dgp = DGP(s=5, a=3, sigStrength=1, rho=0.9, n=100, intercept=5, noise=1,
          var=1, d=1000, DGP.seed = 2018)

#Reduction Phase using only the first 70 observations
outcome.Reduction.Phase =  Reduction.Phase(X=dgp$X[1:70,],Y=dgp$Y[1:70],
                                           family=gaussian, seed.HC = 1012)

# Exploratory Phase using only the first 70 observations, choosing the variables which
# were selected at least two times in the third dimension reduction

idxs = outcome.Reduction.Phase$List.Selection$`Hypercube with dim 2`$numSelected1
outcome.Exploratory.Phase =  Exploratory.Phase(X=dgp$X[1:70,],Y=dgp$Y[1:70],
                                               list.reduction = idxs,
                                               family=gaussian, signif=0.01)

# Model Selection Phase using only the remainer observations
sq.terms = outcome.Exploratory.Phase$mat.select.SQ
in.terms = outcome.Exploratory.Phase$mat.select.INTER

MS = ModelSelection.Phase(X=dgp$X[71:100,],Y=dgp$Y[71:100], list.reduction = idxs,
                          sq.terms = sq.terms,in.terms = in.terms, signif=0.01)




HCmodelSets documentation built on March 31, 2023, 7:02 p.m.