Demonstration of package heritEWAS

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

DNA methylation can be thought of as a type of "mark" on DNA that can affect gene expression. Most methylation marks are erased soon after conception but methylation is known to be effectively inherited from parents to their offspring in rare cases. The heritEWAS package provides functions to identify these heritable DNA methylation marks using the method of (Joo et al., 2018). This approach looks for Mendelian patterns of inheritance within families and is based on the relationship structure of each family and on methylation data for a subset of the family members (but no genotype data). This method can handle large, multi-generational families and it is optimised for methylation data at thousands or millions of methylation sites, such as the data generated by epigenome-wide association studies (EWAS) of related individuals.

You can install the released version of heritEWAS from CRAN with:

install.packages("heritEWAS")
library(heritEWAS)

To use this package, you will need two sets of data (described in more detail in the help page for the function genotype_combinations):

  1. A data frame containing the pedigree data, i.e. the relationship structure of the families. Each row of the data frame corresponds to a person, and the columns correspond to each person's individual identifier (indiv) and the identifiers of his or her mother (mother) and father (father), as well as a family identifier (family) and a binary flag (typed) which is 1 for people who have methylation data available. No family should contain a pedigree loop, such as one caused by inbreeding.

  2. A matrix of the M-values, with rows corresponding to methylation sites (i.e. CpG probes) and columns corresponding to people. The column names should match the individual identifiers of people in the pedigree data with typed = 1.

The pedigree data should look something like the following (where extra variables like aff and age can be included but they will be ignored):

head(ped)
unique(ped$family)

And the M-values matrix should look something like:

# Colnames are the individual IDs of the pedigree data
M_values[1:5, 1:5]

The main goal of the package heritEWAS is to calculate a statistic $\Delta l$ for each methylation site. This statistic measures the strength of evidence that the site's M-values follow a Mendelian pattern of inheritance within the families, with larger values of $\Delta l$ corresponding to more heritable methylation sites. This statistic is the difference in maximised log-likelihoods of two statistical models, and can be interpreted as a difference in the Bayesian information criteria of the two models; see (Joo et al. 2018).

The most time-consuming part of the calculation of $\Delta l$ is the same for all methylation sites, so the heritEWAS package calculates this part once, stores the output, then re-uses this calculation for each methylation site.
This part of the calculation is performed by the function genotype_combinations():

typed_genos <- genotype_combinations(ped)

The results are stored in a named list of data frames, with one data frame per family. Each data frame gives the probability of each possible combination of genotypes for those family members with methylation data (i.e. those with typed = 1). The possible genotypes for each person are 0 and 1, corresponding to non-carriers and carriers (respectively) of a hypothetical genetic variant that controls methylation at a given methylation site under one of the two statistical models used to define $\Delta l$. Impossible genotype combinations (those with a probability of 0) are excluded from the output of genotype_combinations().

str(typed_genos)

Given the genotype probabilities, we can use the package's main function ML_estimates() to compute $\Delta l$ for each site. The output is a data frame with rows corresponding to methylation sites and columns giving details about certain fitted models (see the help page of the function ML_estimates for more details). In particular, the column delta.l gives the statistic $\Delta l$, which measures how heritable each methylation site is.

MLEs <- ML_estimates(typed_genos, M_values, ncores = 2)
head(MLEs)

References

Joo JE, Dowty JG, Milne RL, Wong EM, Dugue PA, English D, Hopper JL, Goldgar DE, Giles GG, Southey MC, kConFab. Heritable DNA methylation marks associated with susceptibility to breast cancer. Nat Commun. 2018 Feb 28;9(1):867. \url{https://doi.org/10.1038/s41467-018-03058-6}



Try the heritEWAS package in your browser

Any scripts or data that you put into this service are public.

heritEWAS documentation built on July 1, 2020, 6:02 p.m.