gma: general model averaging for low-dimensional inputs

gmaR Documentation

general model averaging for low-dimensional inputs

Description

gma provides model averaging for linear regrssion with low dimensional inputs (no more than 20 covariates). The MA methods included are SAIC, SBIC, SFIC, ARM, L1-ARM, MMA and JMA.

Usage

  gma(x,y,factorID=NULL,method='L1-ARM',candi_models=2,n_train=ceiling(n/2),
      no_rep=50)

Arguments

x

Matrix of predictors.

y

Response variable.

factorID

Indication on whether there are categorical variables among the predictors. If factorID= NULL, the predictors are all continuous or have the identifiable categorical variables; If factorID='colnames' or the location numbers of categorical variables, the name or location of variables provided by the user are treated as categorical variables in the linear model. The default factorID is NULL.

method

The method for calculating weights. The method= 'SAIC' is the Smooth-AIC method; the method= 'SBIC' is the Smooth-BIC method; the method= 'SFIC' is the Smooth-FIC method; the method= 'ARM' is the Adaptive Regression by Mixing method; the method= 'L1-ARM' is the L1 Adaptive Regression by Mixing method; the method= 'MMA' is the Mallow's Model Averaging (MMA); the method= 'JMA' is the Jackknife Model Averaging (JMA). The default is 'L1-ARM'.

candi_models

Set to 1 for nested subset models in the order given in predictors; set to 2 for all combinations of subsets; input an m*p matrix, where m is the number of models to be combined, and each row of which is a 0/1 indicator vector representing whether each variable is included/excluded in the model. The default is 2.

n_train

Size of training set when the weight function is L1-ARM or ARM with prior=TRUE. The default value is n_train=ceiling(n/2).

no_rep

Number of replications when the weight function is L1-ARM and ARM. The default value is no_rep=50.

Details

See the paper provided in Reference section.

Value

A 'gma' object is retured. The components are:

weight

The weight for each candidate model.

weight_se

The standard error of the weights of the candidate models over the data-splittings under the method= 'ARM' or method='L1-ARM'.

wbetahat

The weighted estimation of the coefficients.

betahat

The coefficients matrix estimated by candidate models.

candi_models

The candidate models.

Examples

# generate simulation data
n<-50
p<-8
beta<-c(3,1.5,0,0,2,0,0,0)
b0<-1
x<-matrix(rnorm(n*p,0,1),nrow=n,ncol=p)
e<-rnorm(n,0,3)
y<-x%*%beta+b0+e

# compute weight for candidate models using L1-ARM, JMA and SAIC with nested subsets candidate models
lw<-gma(x,y,factorID=NULL,method='L1-ARM',candi_models=1)$weight
jw<-gma(x,y,factorID=NULL,method='JMA',candi_models=1)$weight
saw<-gma(x,y,factorID=NULL,method='SAIC',candi_models=1)$weight

# output the candidate models used for method L1-ARM
candi_models<-gma(x,y,factorID=NULL,method='L1-ARM',candi_models=1)$candi_models

# simulation with categorical variables
n<-100
x1<-rnorm(n)
x2<-rnorm(n)
x3<-rnorm(n)
x4<-factor(sample(1:5,n,replace=T),levels=c(1:5))
X<-data.frame(x1,x2,x3,x4)
Z<-as.matrix(model.matrix(~.-1,data=as.data.frame(X)))[,-4]
mu<-Z%*%c(0.1,0.3,0.5,1,-2,4,-3)
y<-mu+rnorm(n,0,3)

# compute weight for candidate models using MMA with nested subsets candidate models
mmaw <- gma(X, y, factorID = 'x4', method = 'MMA', candi_models = 1)$weight

# early COVID-19 data in China
data(covid19)
y<-covid19[,1]
x<-covid19[,-1]
n<-length(y)

# the weighted estimation using L1-ARM, MMA and SFIC with all subsets candidate models
Cl<-gma(x,y,factorID=NULL,method='L1-ARM',candi_models=2)$wbetahat
Cm<-gma(x,y,factorID=NULL,method='MMA',candi_models=2)$wbetahat
Csf<-gma(x,y,factorID=NULL,method='SFIC',candi_models=2)$wbetahat

zhzhao07/UMA documentation built on Sept. 1, 2022, 2:49 p.m.