penetrance: penetrance: A Package for Penetrance Estimation

View source: R/penetranceMain.R

penetranceR Documentation

penetrance: A Package for Penetrance Estimation

Description

A comprehensive package for penetrance estimation in family-based studies. This package implements Bayesian methods using Metropolis-Hastings algorithm for estimating age-specific penetrance of genetic variants. It supports both sex-specific and non-sex-specific analyses, and provides various visualization tools for examining MCMC results.

This function implements the Independent Metropolis-Hastings algorithm for Bayesian penetrance estimation of cancer risk. It utilizes parallel computing to run multiple chains and provides various options for analyzing and visualizing the results.

Usage

penetrance(
  pedigree,
  twins = NULL,
  n_chains = 1,
  n_iter_per_chain = 10000,
  ncores = 6,
  max_age = 94,
  baseline_data = baseline_data_default,
  remove_proband = FALSE,
  age_imputation = FALSE,
  median_max = TRUE,
  BaselineNC = TRUE,
  var = c(0.1, 0.1, 2, 2, 5, 5, 5, 5),
  burn_in = 0,
  thinning_factor = 1,
  imp_interval = 100,
  distribution_data = distribution_data_default,
  prev = 1e-04,
  sample_size = NULL,
  ratio = NULL,
  prior_params = prior_params_default,
  risk_proportion = risk_proportion_default,
  summary_stats = TRUE,
  rejection_rates = TRUE,
  density_plots = TRUE,
  plot_trace = TRUE,
  penetrance_plot = TRUE,
  penetrance_plot_pdf = TRUE,
  plot_loglikelihood = TRUE,
  plot_acf = TRUE,
  probCI = 0.95,
  sex_specific = TRUE
)

Arguments

pedigree

A list of data frames, where each data frame represents a single pedigree and contains the following columns:

  • PedigreeID: A numeric or character identifier for the family/pedigree. Must be consistent for all members of the same family within a data frame.

  • ID: A unique numeric or character identifier for each individual within their respective pedigree data frame.

  • Sex: An integer representing biological sex: 0 for female, 1 for male. Use NA for unknown sex.

  • MotherID: The ID of the individual's mother. Should correspond to an ID within the same pedigree data frame or be NA if the mother is not in the pedigree (founder).

  • FatherID: The ID of the individual's father. Should correspond to an ID within the same pedigree data frame or be NA if the father is not in the pedigree (founder).

  • isProband: An integer indicating if the individual is a proband: 1 for proband, 0 otherwise.

  • CurAge: An integer representing the age of censoring. This is the current age if the individual is alive, or the age at death if deceased. Must be between 1 and max_age. Use NA for unknown ages (but note this may affect analysis or require imputation).

  • isAff: An integer indicating the affection status for the cancer of interest: 1 if diagnosed, 0 if unaffected. Use NA for unknown status.

  • Age: An integer representing the age at cancer diagnosis. Should be NA if isAff is 0 or NA. Must be between 1 and max_age, and less than or equal to CurAge. Use NA for unknown diagnosis age (but note this may affect analysis or require imputation).

  • Geno: An integer representing the germline genetic test result: 1 for carrier (positive), 0 for non-carrier (negative). Use NA for unknown or untested individuals.

twins

A list specifying identical twins or triplets in the family. Each element of the list should be a vector containing the IDs of the identical siblings within a pedigree. For example: list(c("ID1", "ID2"), c("ID3", "ID4", "ID5")). Default is NULL.

n_chains

Integer, the number of chains for parallel computation. Default is 1.

n_iter_per_chain

Integer, the number of iterations for each chain. Default is 10000.

ncores

Integer, the number of cores for parallel computation. Default is 6.

max_age

Integer, the maximum age considered for analysis. Default is 94.

baseline_data

Data providing the absolute age-specific baseline risk (probability) of developing the cancer in the general population (e.g., from SEER database). All probability values must be between 0 and 1. - If sex_specific = TRUE (default): A data frame with columns 'Male' and 'Female', where each column contains the age-specific probabilities for that sex. The number of rows should ideally correspond to max_age. - If sex_specific = FALSE: A numeric vector or a single-column data frame containing the age-specific probabilities for the combined population. The length (or number of rows) should ideally correspond to max_age. Default data is provided for Colorectal cancer from SEER (up to age 94). If the number of rows/length does not match max_age, the data will be truncated or extended with the last value.

remove_proband

Logical, indicating whether to remove probands from the analysis. Default is FALSE.

age_imputation

Logical, indicating whether to perform age imputation. Default is FALSE.

median_max

Logical, indicating whether to use the baseline median age or max_age as an upper bound for the median proposal. Default is TRUE.

BaselineNC

Logical, indicating that the non-carrier penetrance is assumed to be the baseline penetrance. Default is TRUE.

var

Numeric vector, variances for the proposal distribution in the Metropolis-Hastings algorithm. Default is c(0.1, 0.1, 2, 2, 5, 5, 5, 5).

burn_in

Numeric, the fraction of results to discard as burn-in (0 to 1). Default is 0 (no burn-in).

thinning_factor

Integer, the factor by which to thin the results. Default is 1 (no thinning).

imp_interval

Integer, the interval at which age imputation should be performed when age_imputation = TRUE.

distribution_data

Data for generating prior distributions.

prev

Numeric, prevalence of the carrier status. Default is 0.0001.

sample_size

Optional numeric, sample size for distribution generation.

ratio

Optional numeric, ratio parameter for distribution generation.

prior_params

List, parameters for prior distributions.

risk_proportion

Numeric, proportion of risk for distribution generation.

summary_stats

Logical, indicating whether to include summary statistics in the output. Default is TRUE.

rejection_rates

Logical, indicating whether to include rejection rates in the output. Default is TRUE.

density_plots

Logical, indicating whether to include density plots in the output. Default is TRUE.

plot_trace

Logical, indicating whether to include trace plots in the output. Default is TRUE.

penetrance_plot

Logical, indicating whether to include penetrance plots in the output. Default is TRUE.

penetrance_plot_pdf

Logical, indicating whether to include PDF plots in the output. Default is TRUE.

plot_loglikelihood

Logical, indicating whether to include log-likelihood plots in the output. Default is TRUE.

plot_acf

Logical, indicating whether to include autocorrelation function (ACF) plots for posterior samples. Default is TRUE.

probCI

Numeric, probability level for credible intervals in penetrance plots. Must be between 0 and 1. Default is 0.95.

sex_specific

Logical, indicating whether to use sex-specific parameters in the analysis. Default is TRUE.

Details

Key features:

  • Bayesian estimation of penetrance using family-based data

  • Support for sex-specific and non-sex-specific analyses

  • Age imputation for missing data

  • Visualization tools for MCMC diagnostics

  • Integration with the clipp package for likelihood calculations

Value

A list containing combined results from all chains, including optional statistics and plots.

Author(s)

Maintainer: Nicolas Kubista bmendel@jimmy.harvard.edu

Authors:

  • BayesMendel Lab

See Also

Useful links:

Examples

# Create example baseline data (simplified for demonstration)
baseline_data_default <- data.frame(
  Age = 1:94,
  Female = rep(0.01, 94),
  Male = rep(0.01, 94)
)

# Create example distribution data
distribution_data_default <- data.frame(
  Age = 1:94,
  Risk = rep(0.01, 94)
)

# Create example prior parameters
prior_params_default <- list(
  shape = 2,
  scale = 50
)

# Create example risk proportion
risk_proportion_default <- 0.5

# Create a simple example pedigree
example_pedigree <- data.frame(
  PedigreeID = rep(1, 4),
  ID = 1:4,
  Sex = c(1, 0, 1, 0),  # 1 for male, 0 for female
  MotherID = c(NA, NA, 2, 2),
  FatherID = c(NA, NA, 1, 1),
  isProband = c(0, 0, 1, 0),
  CurAge = c(70, 68, 45, 42),
  isAff = c(0, 0, 1, 0),
  Age = c(NA, NA, 40, NA),
  Geno = c(NA, NA, 1, NA)
)

# Basic usage with minimal iterations
result <- penetrance(
  pedigree = list(example_pedigree),
  n_chains = 1,
  n_iter_per_chain = 10,  # Very small number for example
  ncores = 1,             # Single core for example
  summary_stats = TRUE,
  plot_trace = FALSE,     # Disable plots for quick example
  density_plots = FALSE,
  penetrance_plot = FALSE,
  penetrance_plot_pdf = FALSE,
  plot_loglikelihood = FALSE,
  plot_acf = FALSE
)

# View basic results
head(result$summary_stats)


penetrance documentation built on April 4, 2025, 12:29 a.m.