View source: R/Augmented.data.R
Augmented.data | R Documentation |
We consider the scenario with missingness in environmental (E) measurements. Our approach
consists of two steps. We first develop a nonparametric kernel-based data augmentation
approach to accommodate missingness. Then, we adopt a penalization approach BLMCP
for regularized estimation and selection of important interactions and main genetic (G) effects,
where the "main effects-interactions" hierarchical structure is respected.
As E variables are usually preselected and have a low dimension, selection is not conducted on E
variables. With a well-designed weighting scheme, a nice "byproduct" is that the proposed
approach enjoys a certain robustness property.
Augmented.data(G, E, Y, h, family = c("continuous", "survival"), E_type)
G |
Input matrix of |
E |
Input matrix of |
Y |
Response variable. A quantitative vector for |
h |
The bandwidths of the kernel functions with the first and second elements corresponding to the discrete and continuous E factors. |
family |
Response type of |
E_type |
A vector indicating the type of each E factor, with "ED" representing discrete E factor, and "EC" representing continuous E factor. |
E_w |
The augmented data corresponding to |
G_w |
The augmented data corresponding to |
y_w |
The augmented data corresponding to response |
weight |
The weights of the augmented observation data for accommodating missingness and also
right censoring if |
Mengyun Wu, Yangguang Zang, Sanguo Zhang, Jian Huang, and Shuangge Ma.
Accommodating missingness in environmental measurements in gene-environment interaction
analysis. Genetic Epidemiology, 41(6):523-554, 2017.
Jin Liu, Jian Huang, Yawei Zhang, Qing
Lan, Nathaniel Rothman, Tongzhang Zheng, and Shuangge Ma.
Identification of gene-environment interactions in cancer studies using penalization.
Genomics, 102(4):189-194, 2013.
set.seed(100) sigmaG=AR(0.3,50) G=MASS::mvrnorm(100,rep(0,50),sigmaG) E=matrix(rnorm(100*5),100,5) E[,2]=E[,2]>0 E[,3]=E[,3]>0 alpha=runif(5,2,3) beta=matrix(0,5+1,50) beta[1,1:7]=runif(7,2,3) beta[2:4,1]=runif(3,2,3) beta[2:3,2]=runif(2,2,3) beta[5,3]=runif(1,2,3) # continuous with Normal error N(0,4) y1=simulated_data(G=G,E=E,alpha=alpha,beta=beta,error=rnorm(100,0,4),family="continuous") # survival with Normal error N(0,1) y2=simulated_data(G,E,alpha,beta,rnorm(100,0,1),family="survival",0.7,0.9) # generate E measurements with missingness miss_label1=c(2,6,8,15) miss_label2=c(4,6,8,16) E1=E2=E;E1[miss_label1,1]=NA;E2[miss_label2,1]=NA # continuous data_new1<-Augmented.data(G,E1,y1,h=c(0.5,1), family="continuous", E_type=c("EC","ED","ED","EC","EC")) fit1<-BLMCP(data_new1$G_w, data_new1$E_w, data_new1$y_w, data_new1$weight, lambda1=0.025,lambda2=0.06,gamma1=3,gamma2=3,max_iter=200) coef1=coef(fit1) y1_hat=predict(fit1,E[c(1,2),],G[c(1,2),]) plot(fit1) ## survival data_new2<-Augmented.data(G,E2,y2, h=c(0.5,1), family="survival", E_type=c("EC","ED","ED","EC","EC")) fit2<-BLMCP(data_new2$G_w, data_new2$E_w, data_new2$y_w, data_new2$weight, lambda1=0.04,lambda2=0.05,gamma1=3,gamma2=3,max_iter=200) coef2=coef(fit2) y2_hat=predict(fit2,E[c(1,2),],G[c(1,2),]) plot(fit2)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.