uma.predict: Prediction for Universal Adaptive Regression by Mixing
In zhzhao07/UMA: Universal Model Averaging

View source: R/uma.predict.R

uma.predict

R Documentation

Prediction for Universal Adaptive Regression by Mixing

Description

The predictions based on different MA methods, including SAIC (SAICp), SBIC (SBICp), SFIC, ARM, L1-ARM, UARM, L1-UARM, MMA, JMA, PMA, BMA and MCV.

Usage

uma.predict(x,y,factorID=NULL,newdata,candi_models,weight,method,dim)

Arguments

`x`	Matrix of predictors.
`y`	Response variable.
`factorID`	Indication on whether there are categorical variables among the predictors.

If factorID= NULL, the predictors are all continuous or have the identifiable categorical variables; If factorID='colnames' or the location numbers of categorical variables, the name or location of variables provided by the user are treated as categorical variables in the linear model. The default factorID is NULL.

`candi_models`	The candidate models under specific method, you can be calculated by gma, gma_h, uarm, and uarm_h functions, as shown in the examples.
`newdata`	An optional data frame in which to look for variables with which to predict. If omitted, the fitted values are used.
`weight`	The weights of candidate models under specific methods, you can be calculated by gma, gma_h, uarm, and uarm_h functions, as shown in the examples
`method`	The method= `'UARM'` is the Universal Adaptive Regression by Mixing method; the method= `'L1-UARM'` is the L1 Universal Adaptive Regression by Mixing method; the method= `'SAIC'` is the Smooth-AIC method; the method= `'SBIC'` is the Smooth-BIC method; the method= `'SAICp'` is the Smooth-AIC method with the penalty term; the method= `'SBICp'` is the Smooth-BIC method with the penalty term; the method= `'SFIC'` is the Smooth-FIC method; the method= `'ARM'` is the Adaptive Regression by Mixing method; the method= `'L1-ARM'` is the L1 Adaptive Regression by Mixing method; the method= `'MMA'` is the Mallows Model Averaging (MMA); the method= `'JMA'` is the Jackknife Model Averaging (JMA); the method= `'UARM.rf'` is the Universal Adaptive Regression by Mixing method using the random forest to estimate the standard deviation of random error for candidate models; the method= `'L1-UARM.rf'` is the L1 Universal Adaptive Regression by Mixing method using the random forest to estimate the standard deviation of random error for candidate models; the method= `'PMA'` is the Parsimonious Model Averaging; the method= `'MCV'` is the Cross-validation for Model Averaging (MCV).
`dim`	High-dimensional or low-dimensional methods are used for prediction. If dim =`'H'`, high-dimensional methods are used; otherwise, low-dimensional methods are used.

Details

See the paper provided in Reference section.

Value

A 'uma.predict' object is retured. The components is:

pre_out

The prediction by given method.

Examples

### low dimension

# generate simulation data
n<-50
p<-8
beta<-c(3,1.5,0,0,2,0,0,0)
b0<-1
x<-matrix(rnorm(n*p,0,1),nrow=n,ncol=p)
e<-rnorm(n,0,3)
y<-x%*%beta+b0+e

# user supplied candidate models
candi_models<-rbind(c(0,0,0,0,0,0,0,1),
                    c(0,1,0,0,0,0,0,1),
                    c(0,1,1,1,0,0,0,1),
                    c(0,1,1,0,0,0,0,1),
                    c(1,1,0,1,1,0,0,0),
                    c(1,1,0,0,1,0,0,0))

# compute weight for candidate models using L1-UARM
weightL<-uarm(x,y,factorID=NULL,candi_models=candi_models,n_train=ceiling(n/2),
         no_rep=50,psi=1,method='L1-UARM',prior=TRUE,p0=0.5)$weight

# compute the prediction by method L1-UARM
luma.predict<-uma.predict(x,y,factorID=NULL,newdata=x,candi_models=candi_models,
              weight=weightL,method='L1-UARM',dim='L')$pre_out

# early COVID-19 data in China
data(covid19)
y<-covid19[,1]
x<-covid19[,-1]
n<-length(y)

# compute the predicts for L1-UARM, MMA and SFIC
# user supplied all subsets candidate models
Cl1uarmw<-uarm(x,y,factorID=NULL,candi_models=2,n_train=ceiling(n/2),no_rep=50,
          psi=1,method='L1-UARM',prior=TRUE,p0=0.5)$weight
Cmmaw<-gma(x,y,factorID=NULL,method='MMA',candi_models=2)$weight
Csficw<-gma(x,y,factorID=NULL,method='SFIC',candi_models=2)$weight

# compute the prediction by methods L1-UARM, MMA, SFIC and BMA
cl1uarm.predict<-uma.predict(x,y,factorID=NULL,newdata=x,candi_models=2,
                             weight=Cl1uarmw,method='L1-UARM',dim='L')$pre_out
cmma.predict<-uma.predict(x,y,factorID=NULL,newdata=x,candi_models=2,
                          weight=Cmmaw,method='MMA',dim='L')$pre_out
csfic.predict<-uma.predict(x,y,factorID=NULL,newdata=x,candi_models=2,
                           weight=Cmmaw,method='SFIC',dim='L')$pre_out

#The BMA prediction does not depend on candidate models
cbma.predict<-uma.predict(x,y,factorID=NULL,newdata=x,candi_models=2,
                          method='BMA',dim='L')$pre_out



###high dimension
library(mvtnorm) 
n1=100;n2=1000
p=200
sigma0=1
######
b=rep(0,len=p) #1*p,beta
for(j in 1:12){
  b[j]=2/j
}
# cov setting
Sig = matrix(0,p,p)
rho = 0.5
for(i in 1:p)
{
  for(j in 1:p)
  {
    Sig[i,j] = rho^abs(i-j)
  }
}
# new data
X=matrix(rmvnorm(n1,matrix(0,ncol=1,nrow=p),Sig),nrow=n1)
X_test=matrix(rmvnorm(n2,matrix(0,ncol=1,nrow=p),Sig),nrow=n2)
mu0=X%*%b
mu_test=X_test%*%b
y=mu0+rnorm(n1,0,sigma0)##normal distribution
##########the prediction on each methods
g1=gma_h(x=X,y,factorID=NULL,candidate='H4',method='SBICp',psi=1, prior=TRUE)
pre_out1=uma.predict(x=X,y,factorID=NULL,newdata=X_test,method='SBICp',weight=g1$weight,
                    candi_models=g1$candi_models,dim='H')$pre_out

g2=gma_h(x=X,y,factorID=NULL,candidate='H4',method='PMA',lambda=log(n1))
pre_out2=uma.predict(x=X,y,factorID=NULL,newdata=X_test,method='PMA',weight=g2$weight,
                     candi_models=g2$candi_models,dim='H')$pre_out

g3=gma_h(x=X,y,factorID=NULL,method='MCV',alpha = 0.05)
pre_out3=uma.predict(x=X,y,factorID=NULL,newdata=X_test,method='MCV',weight=g3$weight,
                     candi_models=g3$candi_models,dim='H')$pre_out

g4=gma_h(x=X, y, factorID=NULL,candidate='H4',method='ARM',n_train=n1/2, no_rep=50, psi=1) 
pre_out4=uma.predict(x=X,y,factorID=NULL,newdata=X_test,method='ARM',weight=g4$weight,
                     candi_models=g4$candi_models,dim='H')$pre_out

g5=gma_h(x=X, y, factorID=NULL,candidate='H4',method='L1-ARM',n_train=n1/2, no_rep=50, psi=1)
pre_out5=uma.predict(x=X,y,factorID=NULL,newdata=X_test,method='L1-ARM',weight=g5$weight,
                     candi_models=g5$candi_models,dim='H')$pre_out

g6=uarm_h(x=X,y,factorID=NULL,candidate='H4',n_train=n1/2,method='UARM',
            no_rep=50,p0=0.5,psi=1, prior = TRUE)
pre_out6=uma.predict(x=X,y,factorID=NULL,newdata=X_test,method='UARM',weight=g6$weight,
                candi_models=g6$candi_models,dim='H')$pre_out

g7=uarm_h(x=X,y,factorID=NULL,candidate='H4',n_train=n1/2,method='L1-UARM',
          no_rep=50,p0=0.5,psi=1, prior = TRUE) 
pre_out7=uma.predict(x=X,y,factorID=NULL,newdata=X_test,method='L1-UARM',weight=g7$weight,
                candi_models=g7$candi_models,dim='H')$pre_out

####the performance of different methods
pre_out=cbind(pre_out1,pre_out2,pre_out3,pre_out4,pre_out5,pre_out6,pre_out7)
se=(pre_out-matrix(mu_test,n2,1)%*%rep(1,7))^2
colnames(se)=c('SBICp','PMA','MCV','ARM','L1-ARM','UARM','L1-UARM')
Pre=apply(se,2,mean)
Pre_se=apply(se,2,se)

zhzhao07/UMA documentation built on Sept. 1, 2022, 2:49 p.m.