gma_h: general model averaging for high-dimensional inputs

View source: R/gma_h.R

gma_hR Documentation

general model averaging for high-dimensional inputs

Description

gma_h provides model averaging for linear regression with high-dimensional inputs. The Model Averaging methods included are SAICp, SBICp, ARM, L1-ARM, PMA, MCV.

Usage

  gma_h(x, y, factorID=NULL,candidate='H4',method='L1-ARM', psi=1,
        n_train=ceiling(n/2),no_rep=20, lambda=log(n), alpha=0.05, prior = TRUE)

Arguments

x

Matrix of predictors.

y

Response variable.

factorID

Indication on whether there are categorical variables among the predictors.

If factorID= NULL, the predictors are all continuous or have the identifiable categorical variables; If factorID=‘colnames’ or the location numbers of categorical variables, the name or location of variables provided by the user are treated as categorical variables in the linear model. The default factorID is NULL.

candidate

Method for preparing candidate models.

The method of candidate selection differs depending on whether it contains categorical variables. If the predictors are all continuous variables, the candidate= ‘H4’, the candidate models are on solution paths of 4 common methods, which are lasso, adaptive lasso, SCAD and MCP; the candidate= ‘H2’, the candidate models are on solution paths of 2 common methods, which are lasso, adaptive lasso; the candidate=‘H1’, the candidate models are on solution paths of the lasso. Otherwise, the candidate= ‘CH3’, the candidate models are on solution paths of 3 group selection methods in categorical regression, which are group lasso, group MCP and group SCAD, treating the categorical variable as the indivual groups. If candidate= ‘H0’, the candidate model should be input by the user. When the method is MCV, candidate are not required.

candi_models

This component is used by the user to input the candidate model matrix. It is a matrix of candidate models, each row of which is a 0/1 indicator vector representing whether each variable is included/excluded in the model. For details see example section.

method

The method for calculating weights.

The method= ‘SAICp’ is the Smooth-AIC method with the penalty term; the method= ‘SBICp’ is the Smooth-BIC method with the penalty term; the method= ‘ARM’ is the Adaptive Regression by Mixing method; the method= ‘L1-ARM’ is the L1 Adaptive Regression by Mixing method; the method= ‘PMA’ is the Parsimonious Model Averaging (PMA); the method= ‘MCV’ is the Cross-validation for Model Averaging (MCV).

n_train

Size of training set when the weight function is ARM or L1-ARM. The default value is n_train=ceiling(n/2).

no_rep

Number of replications when the weight function is ARM or L1-ARM. The default value is no_rep=20.

lambda

It is the tunning parameter in PMA. The default is log(n).

alpha

Threshold value for marginal correlation test in MCV. The default is 0.05.

psi

A positive number to control the influence of the prior weight on the models. The default value is 1.

prior

Whether to use prior in the weighting function when the method= (‘ARM’,‘L1-ARM’,‘SAIC’,‘SBIC’). The default is TRUE.

Details

See the paper provided in Reference section.

Value

A 'gma_h' object is retured. The components are:

weight

The weight for each candidate model.

weight_se

The standard error of the weights of the candidate models over the data-splittings under the method= ‘ARM’ or method=‘L1-ARM’.

wbetahat

The weighted estimation of the coefficients.

betahat

The coefficients matrix estimated by candidate models.

candi_models

The candidate models. NOTE: The weights and candi_models of MCV method, see the article for details. its outputs are mainly for prediction.

Examples

library(mvtnorm) 
n=100
p=200
alpha=1
b <- rep(0,len=p)
for (j in 1:10){
  b[j]=(j^(-alpha-0.5))*sqrt(2*alpha)
}
b=5*b/sum(b)
###################cov setting
Sig= matrix(0,p,p)
rho = 0.5
for(i in 1:p)
{
  for(j in 1:p)
  {
    Sig[i,j]=rho^abs(i-j)
  }
}
#################train data
X=matrix(rmvnorm(n,matrix(0,ncol=1,nrow=p),Sig),nrow=n)
mu0=X%*%b
y=mu0+rt(n,df=3)##t distribution
#the calculating for different method.
###################################################
g1=gma_h(x=X, y,factorID=NULL, candidate='H4',method='SAICp',psi=1) 
g2=gma_h(x=X, y, factorID=NULL,candidate='H4',method='ARM',n_train=n/2, no_rep=50, psi=1) 
g3=gma_h(x=X, y, factorID=NULL,candidate='H4',method='L1-ARM',n_train=n/2, no_rep=50, psi=1)
g4=gma_h(x=X, y, factorID=NULL,candidate='H4',method='PMA',lambda=log(n))
#####weight
weight=cbind(g1$weight,g2$weight,g3$weight,g4$weight)
weight_se=cbind(g2$weight_se,g3$weight_se)
#####coefficients estimation and prediction
wbetahat=cbind(g1$wbetahat,g2$wbetahat,g3$wbetahat,g4$wbetahat)
Xs=cbind(1,X)
pre=cbind(Xs%*%g1$wbetahat,Xs%*%g2$wbetahat,Xs%*%g3$wbetahat,Xs%*%g4$wbetahat)
se=(pre-matrix(mu0,n,1)%*%rep(1,4))^2
colnames(se)=c('SAICp','ARM','L1-ARM','PMA')
boxplot(se)
apply(se,2,mean)

##############categorical case
library(nnet)
library(CatReg)
library(mvtnorm) 
n=100
p=200
sigma0=1
######
b <- rep(0,len=p) #1*p,beta
##decay
for (j in 1:7){
  b[j]=1/j
}
###################cov setting
Sig= matrix(0,p,p)
rho = 0.5
for(i in 1:p)
{
  for(j in 1:p)
  {
    Sig[i,j]=rho^abs(i-j)
  }
}
x0= UniformDesignMatrix(n, 1, 3) ##categorical variable
X_c=class.ind(as.matrix(x0))
X0=matrix(rmvnorm(n,matrix(0,ncol=1,nrow=p),Sig),nrow=n)
mu0=X_c[,-1]%*%c(2,4)+X0%*%b
y=mu0+rnorm(n,0,sigma0) ##normal distribution
X=cbind(x0,X0)### combine dummy covariable

###categorical regression
isarm=gma_h(x=X,y,factorID=1,candidate='CH3',method='ARM',n_train=n/2,
            no_rep=50,psi=1,prior=TRUE)
pma=gma_h(x=X,y,factorID=1,candidate='CH3',method='PMA',lambda=log(n))

weight=cbind(isarm$weight,pma$weight)
wbeta=cbind(isarm$wbetahat,pma$wbetahat)


zhzhao07/UMA documentation built on Sept. 1, 2022, 2:49 p.m.