Description Usage Arguments Value References Examples
View source: R/Miss.boosting.R
This geneenvironment analysis approach includes three steps to accommodate both missingness
in environmental (E) measurements and longtailed or contaminated outcomes. At the first step,
the multiple imputation approach based on sparse boosting method is developed to accommodate
missingness in E measurements, where we use NA
to represent those E measurments which
are missing. Here a semiparametric model is assumed to accommodate nonlinear effects, where we
model continuous E factors in a nonlinear way, and discrete E factors in a linear way. For
estimating the nonlinear functions, the B spline expansion is adopted. At the second step, for
each imputed data, we develop RobSBoosting
approach for identifying important main E
and genetic (G) effects, and GE interactions, where the Huber loss function and Qn estimator are
adopted to accommodate longtailed distribution/data contamination (see RobSBoosting
).
At the third step, the identification results from Step 2 are combined based on stability
selection technique.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 
G 
Input matrix of 
E 
Input matrix of 
Y 
Response variable. A quantitative vector for 
im_time 
Number of imputation for accommodating missingness in E variables. 
loop_time 
Number of iterations of the sparse boosting. 
num.knots 
Numbers of knots for the B spline basis. 
Boundary.knots 
The boundary of knots for the B spline basis. 
degree 
Degree for the B spline basis. 
v 
The step size used in the sparse boosting process. Default is 0.1. 
tau 
Threshold used in the stability selection at the third step. 
family 
Response type of 
knots 
List of knots for the B spline basis. Default is NULL and knots can be generated
with the given 
E_type 
A vector indicating the type of each E factor, with "ED" representing discrete E factor, and "EC" representing continuous E factor. 
An object with S3 class "Miss.boosting"
is returned, which is a list with the following components
call 
The call that produced this object. 
alpha0 
A vector with each element indicating whether the corresponding E factor is selected. 
beta0 
A vector with each element indicating whether the corresponding G factor or GE
interaction is selected. The first element is the first G effect and the second to
( 
intercept 
The intercept estimate. 
unique_variable 
A matrix with two columns that represents the variables that are
selected for the model after removing the duplicates, since the 
unique_coef 
Coefficients corresponding to 
unique_knots 
A list of knots corresponding to 
unique_Boundary.knots 
A list of boundary knots corresponding to

unique_vtype 
A vector representing the variable type of 
degree 
Degree for the B spline basis. 
NorM 
The values of B spline basis. 
E_type 
The type of E effects. 
Mengyun Wu and Shuangge Ma. Robust semiparametric geneenvironment interaction analysis using sparse boosting. Statistics in Medicine, 38(23):46254641, 2019.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24  data(Rob_data)
G=Rob_data[,1:20];E=Rob_data[,21:24]
Y=Rob_data[,25];Y_s=Rob_data[,26:27]
knots=list();Boundary.knots=matrix(0,(20+4),2)
for (i in 1:4){
knots[[i]]=c(0,1)
Boundary.knots[i,]=c(0,1)
}
E2=E1=E
##continuous
E1[7,1]=NA
fit1<Miss.boosting(G,E1,Y,im_time=1,loop_time=100,num.knots=c(2),Boundary.knots,
degree=c(2),v=0.1,tau=0.3,family="continuous",knots=knots,E_type=c("EC","EC","ED","ED"))
y1_hat=predict(fit1,matrix(E1[1,],nrow=1),matrix(G[1,],nrow=1))
plot(fit1)
##survival
E2[4,1]=NA
fit2<Miss.boosting(G,E2,Y_s,im_time=2,loop_time=200,num.knots=c(2),Boundary.knots,
degree=c(2),v=0.1,tau=0.3,family="survival",knots,E_type=c("EC","EC","ED","ED"))
y2_hat=predict(fit2,matrix(E1[1,],nrow=1),matrix(G[1,],nrow=1))
plot(fit2)

Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.