polymr: Mendelian randomization-based approximation of non-linear...
In JonSulc/PolyMR: Model Non-Linear Causal Effects Through Polynomial Regression

View source: R/polymr.R

polymr

R Documentation

Mendelian randomization-based approximation of non-linear causal effects

Description

This function approximates a non-linear causal effect through a polynomial regression of observational data, correcting for confounding using an instrumental variable-based approach.

Usage

polymr(
  exposure,
  outcome,
  genotypes,
  return_phenotypes_summary = TRUE,
  return_observational_function = TRUE,
  return_binned_observations = TRUE,
  bins = 100,
  starting_exposure_powers = 1:10,
  max_exposure_power = max(starting_exposure_powers),
  max_control_function_power = NULL,
  power_step = 2,
  reverse_t_thr = NULL,
  p_thr_add = 0,
  p_thr_drop = 1,
  drop_higher_control_function_powers = TRUE
)

Arguments

`exposure`	A vector containing the exposure values for each individual.
`outcome`	A vector containing the outcome values for each individual.
`genotypes`	The NxM genetic matrix, with a column for each variant and a row for each individual.
`return_phenotypes_summary`	Whether to return a data.table containing the median, mean, and standard deviation of both exposure and outcome (default is TRUE).
`return_observational_function`	Whether to return a polynomial approximation of the observed association between exposure and outcome (default is TRUE).
`return_binned_observations`	Whether to return a data.table containing per-bin summary information, including the median exposure and the median, mean, and standard deviation of the outcome, binned on exposure (default is TRUE).
`bins`	Number of bins for which to return mean and median values (default is 100).
`starting_exposure_powers`	A vector containing the exponents for the exposure terms in the initial model. Default is c(1:10), corresponding to a 10th degree polynomial with all lower terms present.
`max_exposure_power`	The maximum exponent to use in modeling the exposure (default is `max(starting_exposure_powers)`). If this is greater greater than `max(starting_exposure_powers)`, `polymr` will iteratively increase (by `power_step`) the degree of the causal polynomial function as long as new terms are significant (p < `p_thr_add`).
`max_control_function_power`	The maximum exponent to use in modeling the control function component. Default is NULL, in which case the control function polynomial will include all terms from 1 to the highest degree of the exposure component.
`power_step`	The number by which to increment the degree of the exposure polynomial each iteration until `max_exposure_power` is reached or the new terms are no longer significant (p > `p_thr_add`). Default is 2, as even and odd degree terms have different properties.
`p_thr_add`	The p-value threshold determining if newly added exposure terms should be considered significant enough to further increase the degree of the polynomial (by `power_step`, up to `max_exposure_power`). Default is 0, which will prevent new terms from being added. A value of NULL will be equivalent to a per-step Bonferroni correction, i.e. 0.05 / 'power_step'.
`p_thr_drop`	The p-value threshold determining which, if any, exposure terms should be dropped from the final function. This is done iteratively and the significance of each term is assessed in the new context before proceeding again, if necessary, until all remaining terms reach the defined significance threshold. Default is 1, which will retain all terms. A value of NULL will use a Bonferroni-corrected threshold at each step.
`drop_higher_control_function_powers`	Logical indicating whether control function terms with a higher degree than the highest exposure term should be dropped. Default is TRUE. Only relevant if `p_thr_drop` < 1 or is NULL.
`reverse_t`	Threshold to use for reverse causality filtering (T statistic), NULL for no filtering (default). A value of 0 represents a simple filtering out of IVs explaining more variance in the outcome than the exposure, whereas a value of 1.645 (`qnorm(.95)`) would remove only those where that difference is significant (p < 0.05).

Details

The polymr() function estimates the causal effect of the exposure on the outcome through polynomial regression, correcting for confounding by including a polynomial of the control function. Full details of the method can be found in the article (citation("PolyMR")).

Value

Returns a named list of results for PolyMR itself and the other selected values:

phenotypes_summary is a data.table with the median, mean, and standard deviation of both exposure and outcome
binned_observations is a data.table with per-bin summary information, including the median exposure and the median, mean, and standard deviation of the outcome (binned on the exposure).
binned_observations_scaled is a data.table with per-bin summary information for the scaled exposure and outcome (which will be used for modeling), including the median exposure and the median, mean, and standard deviation of the outcome (binned on the exposure).
observational is a list-like object of class EOModel containing:
- outcome_model, an object of class lm containing the full model. Use summary() for more details.
- vcov, the variance-covariance matrix which can be used to create the 95
- pval_null_model, the p-value for the full model (F-statistic-based).
- pval_linear_model, the LRT p-value comparing the full model to the linear model.
- r_squared, the variance explained by the model.
polymr is a list-like object of class PolyMRModel, the contents of which are similar to those of observational:
- outcome_model, an object of class lm containing the full model. Use summary() for more details.
- vcov, the variance-covariance matrix of the coefficient estimates, which can be used to create the 95 for plotting.
- pval_null_model, the LRT p-value comparing the full model to the model with (all) the control function terms but no exposure terms.
- pval_linear_model, the LRT p-value comparing the full model to the linear model containing all control function terms but only the degree 1 (linear) exposure term.
- r_squared, the variance of the outcome attributable to the causal effect of the exposure. This is obtained by comparing the R-squared of the full model to that of the null model (containing only the control function terms).

Note

Both the exposure and outcome will be standardized (centered to have mean 0 and scaled to have standard deviation 1) prior to modeling. The returned coefficients correspond to these transformed phenotypes. New data can be transformed to this scale using the values saved in phenotypes_summary.

Examples

simulated_data <- PolyMR:::new_PolyMRDataSim()
polymr(exposure  = simulated_data$exposure,
       outcome   = simulated_data$outcome,
       genotypes = simulated_data$genotypes,
       reverse_t_thr = 0,
       p_thr_drop = NULL)

JonSulc/PolyMR documentation built on April 26, 2023, 10:42 a.m.