knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%"
)

MRcML for Independent Samples and Overlapping Samples

R package for Mendelian randomization with constraind maximum likelihood (MRcML) methods. Here is the reference for the original MRcML method with two independent samples: Constrained maximum likelihood-based Mendelian randomization robust to both correlated and uncorrelated pleiotropic effects. Here is the reference for the extension of MRcML method with overlapping samples: Combining Mendelian randomization and network deconvolution for inference of causal networks with GWAS summary data.

Installation

Install the package from GitHub with:

# install.packages("devtools")
devtools::install_github("xue-hr/MRcML")

Example

Here is an example which shows how to apply MRcML methods to make inference about the causal effect from Fast Glucose (FG) to Type-2 Diabetes (T2D).

library(MRcML)
summary(T2D_FG)

Example data T2D_FG is a list which contains estimated effects sizes and standard errors of 17 SNPs on T2D and FG. Now we perfrom the main function with sample size of FG which is 46186, and using 100 random start points. We set the random seed random_seed = 1 to make sure results are replicable. First we use the function mr_cML(), which is for the two independent sample case. Then we apply the function mr_cML_Overlap(), which is for the overlapping sample case; for illustration purpose, here we assume the correlations between GWAS summary data being 0.1, i.e. setting rho = 0.1.

### mr_cML() for two independent samples
cML_result = mr_cML(T2D_FG$b_exp,
                    T2D_FG$b_out,
                    T2D_FG$se_exp,
                    T2D_FG$se_out,
                    n = 46186,
                    random_start = 100,
                    random_seed = 1)

### mr_cML_Overlap() for two overlapping samples
cML_result_Overlap = mr_cML_Overlap(T2D_FG$b_exp,
                                    T2D_FG$b_out,
                                    T2D_FG$se_exp,
                                    T2D_FG$se_out,
                                    n = 46186,
                                    random_start = 100,
                                    random_seed = 1,
                                    rho = 0.1)

We get a warning message from the function cML_estimate_random(). The reason is: here we use 100 random starting points to minimize the non-convex loss function, some of them may not converge to a local minimum and result in Fisher Information matrices that are not positive definite. It is not likely affecting the optimization result, since in the end we only use the start point gives the minimum loss and discard all other start points including those do not converge. Now lets take a look at the results:

cML_result
cML_result_Overlap

BIC selected model gives us indices of invalid IVs: 8, 12, 13, 15, 17. Now lets draw the scatter plot, invalid IVs are marked with blue:

library(ggplot2)
plot_data = data.frame(b_exp = T2D_FG$b_exp,b_out = T2D_FG$b_out,
                           se_exp = T2D_FG$se_exp,se_out = T2D_FG$se_out)
    invalid_ind = as.factor(as.numeric(is.element(1:17,c(8, 12, 13, 15, 17))))
    plot_data = cbind(plot_data,invalid_ind = invalid_ind)

    fp2 = 
      ggplot(data = plot_data,aes(x = b_exp,y = b_out)) +
      geom_point(aes(col = invalid_ind)) + 
      geom_errorbar(aes(ymin = b_out-se_out,ymax = b_out+se_out,
                        col = invalid_ind),width=0) + 
      geom_errorbarh(aes(xmin = b_exp-se_exp, xmax = b_exp+se_exp,
                         col = invalid_ind),height=0) + 
      scale_color_manual(values = c("grey","blue")) + 
      theme(legend.position = "none") + 
      #geom_abline(intercept = 0,
      #            slope = c(TLP_result[i,c(1,BIC_min_ind)]),
      #            col = c("black","red"),
      #            linetype=c("dashed","solid"),
      #            size = c(1,1)
      #) + 
      geom_hline(yintercept = 0, linetype="solid", 
                 color = "black", size=0.5) +
      geom_vline(xintercept = 0, linetype="solid", 
                 color = "black", size=0.5) +
      labs(x = "Effect Size of FG",
           y = "Effect Size of T2D")

    fp2

Now let us perform cML with data perturbation. The default number of perturbations is 200, and we use 10 random start points. In real application, we recommend use more random start points to get reliable results even it takes more time, like 10 or even 100; in simulations the number of random start points could be set to 0 (i.e. do not use random start) to speed up. First we use the function mr_cML_DP(), which is for the two independent sample case. Then we apply the function mr_cML_DP_Overlap(), which is for the overlapping sample case; again for illustration purpose, here we assume the correlations between GWAS summary data being 0.1, i.e. setting rho = 0.1.

### mr_cML_DP() for two independent samples
cML_result_DP = mr_cML_DP(T2D_FG$b_exp,
                          T2D_FG$b_out,
                          T2D_FG$se_exp,
                          T2D_FG$se_out,
                          n = 46186,
                          random_start = 10,
                          random_start_pert = 10,
                          random_seed = 1,
                          num_pert = 200)

### mr_cML_DP_Overlap() for two overlapping samples
cML_result_DP_Overlap = mr_cML_DP_Overlap(b_exp = T2D_FG$b_exp,
                                          b_out = T2D_FG$b_out,
                                          se_exp = T2D_FG$se_exp,
                                          se_out = T2D_FG$se_out,
                                          n = 46186,
                                          random_start = 10,
                                          random_start_pert = 10,
                                          random_seed = 1,
                                          num_pert = 200,
                                          rho = 0.1)

Results with data perturbation:

cML_result_DP
cML_result_DP_Overlap


xue-hr/MRcML documentation built on Aug. 25, 2024, 4:24 p.m.