knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%"
)

SUMnlmr

The goal of SUMnlmr is to allow investigations of potentially non-linear relationships between an exposure and an outcome via a Mendelian randomization framework, without requiring full access to individual level genetic data.

It is based on the existing package for individual data by James Staley: nlmr (available from https://github.com/jrs95/nlmr ).

The core concept is to split the process into two distinct halfs: one requiring individual level data, which is converted into a semi-summarized form (create_nlmr_summary) by dividing the population into strata based on the IV-free exposure. Associations with the exposure and the outcome are estimated in each stratum. In the second half, this semi-summarized form can then be shared, without compromising patient privacy, and investigated seperately using two IV methods: a fractional polynomial method (frac_poly_summ_mr) and a piecewise linear method (piecewise_summ_mr). Both methods calculate a localised causal effect (LACE). The piecewise method fits a continuous piecewise linear function to these estimates, while the fractional polynomial method fits the best 1 or 2 term fractional polynomial.

Functions

create_nlmr_summary - prepares individual level data into semi-summarised form, ready to fit nlmr models. fracpoly_summ_mr - this method performs IV analysis using fractional polynomials piecewise_summ_mr - this method performs IV analysis using piecewise linear function

Installation

You can install the released version of SUMnlmr from GitHub with:

# install.packages("devtools")
devtools::install_github("amymariemason/SUMnlmr")

Example 1: Summarizing data

This is a basic example which shows you how to create the semi-summarized data form. First we create some practise data:

library(SUMnlmr)
## create some data to practise on
test_data<-create_ind_data(N=10000, beta2=2, beta1=1)
# this creates quadratic.Y  = x + 2x^2 + errorY 
head(test_data)

Then we use create_nlmr_summary to summarise it.

## create the summarized form
## 
summ_data<-create_nlmr_summary(y = test_data$quadratic.Y,
                                x = test_data$X,
                                g = test_data$g,
                                covar = NULL,
                                family = "gaussian",
                                strata_method = "residual", 
                                controlsonly = FALSE,
                                q = 10)

head(summ_data$summary)

If we have co-variants we want to adjust for in our analysis, we need to include them at this stage.

## create the summarized form
summ_covar<-create_nlmr_summary(y = test_data$quadratic.Y,
                                x = test_data$X,
                                g = test_data$g,
                                covar = matrix(data=c(test_data$linear.Y,
                                                      test_data$sqrt.Y),ncol=2),
                                family = "gaussian",
                                strata_method = "residual", 
                                q = 10)

head(summ_covar$summary)

Note: Because the covariants are included as a matrix, lm cannot detect factor variables and create automatic dummy variables for them. The easiest way to include factor variables is to make these dummy variables by hand instead using the model.matrix command. e.g.

## create a factor
test_data$centre<- as.factor(rbinom(nrow(test_data),4, 0.5))

#turn factor into binary contrasts against first factor
dummies<- model.matrix(~centre,data=test_data)[,2:5]

summ_covar2<-create_nlmr_summary(y = test_data$quadratic.Y,
                                 x = test_data$X,
                                 g = test_data$g,
                                 covar = dummies,
                                 family = "gaussian",
                                 q = 10)

head(summ_covar2$summary)

These have used a single genetic variant count, but the method works identically with an genetic score function for g instead. Logistic or cox models in the G-Y relationship can be used by changing the family option - see details in the create_nlmr_summary function description.

It is also possible to implement the doubly-ranked method described in Haodong's paper https://www.biorxiv.org/content/10.1101/2022.06.28.497930v1

## create the summarized form with the doubly ranked method
summ_ranked<-create_nlmr_summary(y = test_data$quadratic.Y,
                                x = test_data$X,
                                g = test_data$g,
                                covar = matrix(data=c(test_data$linear.Y,
                                                      test_data$sqrt.Y),ncol=2),
                                family = "gaussian",
                                strata_method = "ranked", 
                                q = 10)

head(summ_ranked$summary)

Once your data is in this format, the output data frame is all you need to share to fit the fractional polynomial or piecewise linear models onto the data.

Example 2: Fitting a fractional polynomial model

Your data needs to be in the semi-summarised form as shown above. We can then fit a fractional polynomial model:

model<- with(summ_data$summary, frac_poly_summ_mr(bx=bx,
                  by=by, 
                  bxse=bxse, 
                  byse=byse, 
                  xmean=xmean,
                  family="gaussian",
                  fig=TRUE)
)


summary(model)

This also produces a graph of the fit with 95% confidence intervals. This is a ggplot object and can be adjusted with ggplot commands

library(ggplot2)
f <- function(x) (x + 2*x^2 - mean(summ_data$summary$xmean) -
                    2*mean(summ_data$summary$xmean)^2 )

plot1 <- model$figure+ 
  stat_function(fun = f, colour = "green") +
  ggtitle("fractional polynomial fit from semi-summarized data")

plot1

There is also p-values provided in p_test and p_het. This is identical to the testing provided by the nlmr package: fp_d1_d2 : test between the fractional polynomial degrees fp : fractional polynomial non-linearity test quad: quadratic test Q : Cochran Q test and Q: Cochran Q heterogeneity test trend: trend test

``` {r example6} model$p_tests

model$p_heterogeneity

## Example 3: Piecewise linear model

We can instead fit a piecewise linear model to the same summarised data

```r

model2 <-with(summ_data$summary, piecewise_summ_mr(by, bx, byse, bxse, xmean, xmin,xmax, 
                  ci="bootstrap_se",
                  nboot=1000, 
                  fig=TRUE,
                  family="gaussian",
                  ci_fig="ribbon")
)

summary(model2)

Again the figure is a ggplot object and can be adjusted similarly.

plot2 <- model2$figure+ 
  stat_function(fun = f, colour = "green") +
  ggtitle("piecewise linear fit from semi-summarized data")

plot2

Example 4: Binary outcome

The functions above can also fit binary outcome data, via a generalised linear model.

``` {r bin}

test_data$y.bin<-stats::rbinom(size=1, p=0.5, n=10000)

create summ data

summ_bin<-create_nlmr_summary(y = test_data$y.bin, x = test_data$X, g = test_data$g, covar = NULL, family = "binomial", q = 10)

fit fractional poly model

model3<- with(summ_bin$summary,frac_poly_summ_mr(bx=bx, by=by, bxse=bxse, byse=byse, xmean=xmean, family="binomial", fig=TRUE) )

summary(model3)

Not unsurprisingly, we find no evidence of an effect, causal or otherwise, 
as the binary outcome was randomly distributed. 

If we look instead at the semi-summarised UK Biobank datasets on LDL-cholesterol 
and CAD, one with and one without covariates. Here we can see a potentially 
non-linear trend in the univariate data, which becomes a clear linear trend once
covariates are included.

```r

# fit piecewise linear model
model4 <-with(LDL_CAD, piecewise_summ_mr(by, bx, byse, bxse, xmean, xmin,xmax, 
                  ci="bootstrap_se",
                  nboot=1000, 
                  fig=TRUE,
                  family="gaussian",
                  ci_fig="ribbon")
)


summary(model4)


# fit piecewise linear model
model5 <-with(LDL_CAD_covar,piecewise_summ_mr(by, bx, byse, bxse, xmean, xmin,xmax, 
                  ci="bootstrap_se",
                  nboot=1000, 
                  fig=TRUE,
                  family="gaussian",
                  ci_fig="ribbon")
)

summary(model5)


amymariemason/SUMnlmr documentation built on July 22, 2024, 10:03 a.m.