R/ML_estimates.R

Defines functions ML_estimates

Documented in ML_estimates

#' Compute maximum likelihood estimates and \eqn{\Delta}l
#'
#' For each methylation site, this function computes certain maximum likelihood
#' estimates and a measure of heritability called \eqn{\Delta}l
#' (with higher values corresponding to more highly heritable methylation
#' sites), as described briefly below and fully in (Joo et al., 2018).
#'
#' @param typed_genos A named list, usually generated by
#' \code{\link{genotype_combinations}}.
#' Each element of the list is a data frame specifying all possible joint
#' genotypes of selected family members within each family, and the joint
#' probability of each genotype combination.
#' @param M_values A matrix of M-values,with rows corresponding to the
#' methylation sites and columns corresponding to people.
#' The column names should correspond to the column names appearing in
#' `typed_genos`.
#' @param ncores The number of cores to be used, with `ncores = 1` (the default)
#' corresponding to non-parallel computing.
#' When `ncores > 1`, the `parallel` package is used to
#' parallelize the calculation, by dividing the methylation sites between
#' the cores.
#' @param sort Re-order the methylation sites to have decreasing values of `delta.l`
#' if \code{TRUE} (the default), or leave the sites in the original order if \code{FALSE}
#' @param na_omit Remove any methylation sites with missing values (`NA`) of `delta.l`
#' if \code{TRUE} (the default), or return the results for all sites if \code{FALSE}.
#' Usually, missing values of `delta.l` are due to well-known singularities
#' in the likelihood of the Gaussian mixtures model.
#'
#' @return A data frame with 15 columns. In the column names, the suffixes
#' `.mendel` and `.mix` refer the Mendelian and mixture models of
#' (Joo et al., 2018).
#' Briefly, the mixture model is the standard Gaussian mixture model with two
#' groups (group `0` and group `1`),
#' so group memberships are independent and the M-values of each group are
#' normally distributed. The Mendelian model is the same except that group
#' memberships are dependent within families, and are modelled as the carrier
#' status of a rare, autosomal genetic variant.
#' In the column names, the prefixes `mu` and `sd` refer to the maximum
#' likelihood estimates of the mean and standard deviation of each group's
#' normal distribution, and the suffix `ll` refers to each model's maximised
#' log-likelihood (i.e., the log-likelihood function evaluated at the maximum
#' likelihood estimates).  The suffix `.null` refers to the null model that is
#' nested inside both the Mendelian and mixture models, in which the means and
#' standard deviations for the two groups are equal
#' (i.e., `mu0 = mu1` and `sd0 = sd1`).
#' The column `delta.l` gives the difference between `ll.mendel` and `ll.mix`,
#' and is the measure of heritability (\eqn{\Delta}l) that was introduced in
#' (Joo et al., 2018).
#'
#' @importFrom stats na.omit
#' @export
#'
#' @references
#' Joo JE, Dowty JG, Milne RL, Wong EM, Dugué PA, English D, Hopper JL,
#' Goldgar DE, Giles GG, Southey MC, kConFab.  Heritable DNA methylation marks
#' associated with susceptibility to breast cancer.  Nat Commun. 2018
#' Feb 28;9(1):867. \url{https://doi.org/10.1038/s41467-018-03058-6}
#'
#' @examples
#' # Example data
#' str(ped)
#' str(M_values)
#'
#' # Calculate genotype probabilities
#' typed_genos <- genotype_combinations(ped)
#' str(typed_genos)
#'
#'\donttest{
#' # Compute Delta l
#' MLEs <- ML_estimates(typed_genos, M_values, ncores = 4)
#' str(MLEs)
#'}
#'
ML_estimates <- function(typed_genos, M_values, sort = TRUE,
                         na_omit = TRUE, ncores = 1) {

  message("Loading data")
  M <- filter_M(typed_genos, M_values)

  message("Computing log-likelihood for the standard mixture model")
  loglike_mix <- em_mix(typed_genos, M, ncores = ncores)

  message("Computing log-likelihood for the Mendelian model")
  loglike_mendel <- em_mendelian(typed_genos, M, ncores = ncores)

  deltal <- data.frame(
    methylation.site = as.character(loglike_mendel$probe),
    delta.l = loglike_mendel$ll.mendel - loglike_mix$ll.mix,
    ll.null = loglike_mendel$ll.null,
    ll.mix = loglike_mix$ll.mix,
    ll.mendel = loglike_mendel$ll.mendel,
    mu.null = loglike_mendel$mu.null,
    sd.null = loglike_mendel$sd.null,
    mu0.mendel = loglike_mendel$mu0,
    sd0.mendel = loglike_mendel$sd0,
    mu1.mendel = loglike_mendel$mu1,
    sd1.mendel = loglike_mendel$sd1,
    mu0.mix = loglike_mix$mu0,
    sd0.mix = loglike_mix$sd0,
    mu1.mix = loglike_mix$mu1,
    sd1.mix = loglike_mix$sd1,
    stringsAsFactors = FALSE
  )

  if (sort) {
    deltal <- deltal[order(deltal$delta.l, decreasing = TRUE), ]
  }

  if (na_omit) {
    deltal <- na.omit(deltal)
  }

  deltal
}

Try the heritEWAS package in your browser

Any scripts or data that you put into this service are public.

heritEWAS documentation built on July 1, 2020, 6:02 p.m.