Examples for Third-Variable Effect Analysis"

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  error=TRUE,
  warning=FALSE
)

Package installation

R package mma is used for general third-variable effect analysis [@Yu2017a]. The third-variable effect (TVE) refers to the effect conveyed by intervening variables to an observed relationship between an exposure and a response variable (outcome). In this package, the exposure is also called the predictor, the intervening variables are called mediators or confounders (MC). The TVE include total effect, direct effect, and indirect effects from different third variables. The effects are defined in Yu et al. [-@Yu2014]. The algorithms on how to make inferences on TVEs and how to explain the effects are described in Yu et al. [-@Yu2014].

To use the R package mma, we first install the package in R (install.packages("mma")) and load it.

library(mma)
#source('C:/Users/qyu/Desktop/Research/mma/version current/R/mma.r')

Data Organization and Identify Potential Mediators/Confounders

We use weight_behavior data as an example to show how to do a preliminary data analysis to identify potential MCs and covariates. The function data.org is used for this purpose. The list of arguments for data.org are defined in @Yu2017a. In the following example, the predictor, "pred", is gender, and the outcome, "y", is a binary indicator of overweight (1) or not (0). The reference group of the exposure variable is the male as defined by "predref="M"". There are 12 variables in "x", which includes all potential MCs and covariates. As an argument in the function data.org, "mediator=5:12" indicates that the variables from column 5 to column 12 in x should be tested for potential MCs. Other columns in "x" are to be treated as covariates. The "mediator" argument can also be a character vector with names of the potential MCs in x. Two tests are needed to identify a potential MC: first, the variable is significantly related with the predictor adjusting for other covariates. The other covariates here is defined by "cova". If cova is a data frame by itself, all covariates in cova are for all potential MCs. If cova is a list, the first item will be the covariate data frame and the second item list the names or column numbers of MCs in "x" that should adjust for the covariates. The significance level is set by "alpha2", whose default value is 0.1. Second, the variable has to be significantly related with the outcome adjusting (testtype=1, by default) or not adjusting (testtype=2) for the predictor and other variables. The significance level is set by "alpha". In the following example, p-value 1 shows the results for the second test. and p-value 2 are the results for the first test. A variable that pass the second test but not the first test is considered as covariates. Variables do not pass the second test are not adopted in further analysis. Variables in "x" but not in "mediator" are forced in further analysis as covariates.

"jointm" is a list with the first item identify the number of groups of MCs whose joint effects are of interesting, where each group is defined in an item of "jointm". In the above example, the joint effect of Variables 5, 7, 9 is of interesting. These variables are forced into analysis as potential MCs. A variable can be shown in "jointm" multiple times. The joint effect of each group of variables will be reported. A trick to force in variables as potential MCs is to assign them in a separate group of "jointm".

One binary/continuous predictor and one binary/continuous outcome

data("weight_behavior")
#binary predictor x
#binary y
 x=weight_behavior[,c(2,4:14)]
 pred=weight_behavior[,"sex"]
 y=weight_behavior[,"overweigh"]
 data.b.b.1<-data.org(x,y,mediator=5:12,jointm=list(n=1,j1=7:9),
                        pred=pred,predref="M", alpha=0.4,alpha2=0.4)
 summary(data.b.b.1)

The results from "data.org" function is summarized as above.

A potential MC defined in "mediator" argument is identified as categorical or binary if it has only two unique values or if the variable is a vector factors or characters. By default, the reference group is the first level of the variable. If the reference group needs to be changed, the categorical variable should be defined in binmed or catmed, and then the corresponding reference group in binref and catref. Continuous MCs can be defined by contmed. The following example did the same analysis as above, only that it defines that potential continuous mediators are columns 7,8,9,11, and 12 of x, binary mediators are columns 6 and 10, and categorical mediator is column 5 of x with 1 to be the reference group for all categorical and binary mediators.

 data.b.b.2<-data.org(x,y,pred=pred,contmed=c(7:9,11:12),binmed=c(6,10), jointm=list(n=1,j1=7:9),
   binref=c(1,1),catmed=5,catref=1,predref="M",alpha=0.4,alpha2=0.4) 
 summary(data.b.b.2)

Survival outcome

The functions in mma can deal with binary, categorical, continuous, or time-to-event outcomes. If the outcome is time-to-event, it should be defined by the "Surv" function in the survival package. An example of application on breast cancer survival can be found in @Yu2019.

 cgd1<-cgd0
 status<-ifelse(is.na(cgd1$etime1),0,1)
 y=Surv(cgd1$futime,status) 
 x=cgd1[,c(5:7,9:12)]
 pred=cgd1[,4]
 data.b.surv<-data.org(x,y,pred=pred,mediator=1:ncol(x),   
                    alpha=0.4,alpha2=0.4)
 summary(data.b.surv)

Multivariate predictors

In addition, the package can handle multivariate and multicategorical predictors. If the predictor is multicategorical of k levels, "data.org" first transform the predictor to k-1 binary predictors. If a variable significantly relates with any of the k-1 predictors, the variable passes the first test described above. P-value 2 is shown for each predictor. In the following example, the predictor is the race with six levels.

 #multivariate predictor
 x=weight_behavior[,c(2:3,5:14)]
 pred=weight_behavior[,4]
 y=weight_behavior[,15]
 data.mb.b <-data.org(x,y,mediator=5:12,jointm=list(n=1,j1=c(5,7,9)),
                      pred=pred,predref="OTHER", alpha=0.4,alpha2=0.4)
 summary(data.mb.b)

Multivariate outcomes

Similarly, the package can deal with multivariate outcomes. The following code deals with multivariate predictors and multivariate responses. If the variable is significantly related with any one of the outcomes, it passes the second test described above. The results from "data.org" are summarized for each combination of the exposure-outcome relationship.

 #multivariate responses
 x=weight_behavior[,c(2:3,5:14)]
 pred=weight_behavior[,4]
 y=weight_behavior[,c(1,15)]
 data.mb.mb<-data.org(x,y,mediator=5:12,jointm=list(n=1,j1=c(5,7,9)),
                      pred=pred,predref="OTHER", alpha=0.4,alpha2=0.4)
 summary(data.mb.mb)

Third-Variable Effect Analysis

Next, we use med function to estimate the TVE using the results from data.org function. The default approach is using a generalized linear model to fit the final full model. If "nonlinear=TRUE"", multiple Additive Regression Trees (MART) will be used to fit the relationship between outcome(s) and other variables, and smoothing splines used to fit the relationship between the potential MC and exposure variables(s).

Binary predictor and binary outcome

 med.b.b.2<-med(data=data.b.b.2,n=2,nonlinear=TRUE)
 med.b.b.2<-med(data=data.b.b.2,n=2,nonlinear=TRUE)

Using the linear method and to show the results:

 med.b.b.1<-med(data=data.b.b.2,n=2)
 med.b.b.1

We can plot the marginal effect of the MC on the outcome, and the marginal effect of the predictor on the MC. If the predictor is binary, plot.med draws a histogram or boxplot of the marginal density of the MC at each different value of the predictor.

plot(med.b.b.1, data.b.b.2,vari="exercises",xlim=c(0,50))
plot(med.b.b.2, data.b.b.2,vari="sports")

Survival outcome

For survival outcome, the default option is to fit the final full model using Cox proportional hazard model. Note that when use type="lp", the linear predictor is the type of predicted value as in predict.coxph, the results is similar to those from MART. Readers are referred to @Yu2019 for more details.

 med.b.surv.1<-med(data=data.b.surv,n=2,type="lp") 
  #close to mart results when use type="lp"
 med.b.surv.2<-med(data=data.b.surv,n=2,nonlinear=TRUE)  
 #results in the linear part unit
 med.b.surv.1<-med(data=data.b.surv,n=2,type="lp") 
  #close to mart results when use type="lp"
 med.b.surv.2<-med(data=data.b.surv,n=2,nonlinear=TRUE)  
 #results in the linear part unit

Statistical Inference on TVE Analysis

boot.med function is used to make inferences on the TVE. Results return from data.org function is input as the first argument.

Binary predictor and binary outcome

 bootmed.b.b.1<-boot.med(data=data.b.b.2,n=2,n2=50)
 summary(bootmed.b.b.1)

Similarly, the default approach is using a generalized linear model to fit the final full model. If the argument nonlinear=TRUE, multiple Additive Regression Trees (MART) will be used to fit the model in estimating the outcome. Applications for nonlinear Mediation Analysis are described in Yu et al. [-@Yu2017b;-@Yu2018].

 bootmed.b.b.2<-boot.med(data=data.b.b.2,n=2,n2=40,nu=0.05,nonlinear=TRUE)
 bootmed.b.b.2<-boot.med(data=data.b.b.2,n=2,n2=4,nu=0.05,nonlinear=TRUE)
 summary(bootmed.b.b.2)

Plots of the fitted mma object from boot.med

plot.mma() plots the marginal effect of the selected variable on the outcome, and the marginal effect of the predictor on the selected variable. Interpretations for similar plots can be found in @Yu2017b.

 plot(bootmed.b.b.1,vari="exercises",xlim=c(0,50))
 plot(bootmed.b.b.1,vari="sports")

Combined function for multiple TVE analysis

The mma function is a combined function that automatically identify potential MCs, based on which to make statistical inference on the mediation effects.

 x=weight_behavior[,c(2,4:14)]
 pred=weight_behavior[,3]
 y=weight_behavior[,15]
 mma.b.b.glm<-mma(x,y,pred=pred,contmed=c(7:9,11:12),binmed=c(6,10),binref=c(1,1), catmed=5,catref=1,predref="M",alpha=0.4,alpha2=0.4,n=2,n2=2)
 summary(mma.b.b.glm)

When the argument nonlinear=TRUE, multiple Additive Regression Trees (MART) will be used to fit the final full model in estimating the outcome.

 mma.b.b.mart<-mma(x,y,pred=pred,contmed=c(7:9,11:12),binmed=c(6,10),binref=c(1,1), catmed=5,catref=1,predref="M",alpha=0.4,alpha2=0.4,nonlinear=TRUE,n=2,n2=5)
 summary(mma.b.b.mart)
 mma.b.b.mart<-mma(x,y,pred=pred,contmed=c(7:9,11:12),binmed=c(6,10),binref=c(1,1),
                    catmed=5,catref=1,predref="M",alpha=0.4,alpha2=0.4,nonlinear=TRUE,n=2,n2=5)
 summary(mma.b.b.mart)
plot(mma.b.b.mart,vari="exercises")
plot(mma.b.b.glm,vari="sweat")

References



Try the mma package in your browser

Any scripts or data that you put into this service are public.

mma documentation built on Jan. 13, 2021, 12:16 p.m.