find_informative_sites: Discover informative CpG sites

View source: R/find_informative_sites.R

find_informative_sitesR Documentation

Discover informative CpG sites

Description

This function generates a set of informative CpG sites to estimate the purity or the tumor content of a set of tumor samples.

Usage

find_informative_sites(
  tumor_table,
  control_table,
  auc,
  ref_table,
  cores = 1,
  max_sites = 20,
  min_distance = 1000000,
  percentiles = c(0, 100),
  hyper_range = c(min = 0.4, max = 0.9),
  hypo_range = c(min = 0.1, max = 0.6),
  control_costraints = c(0.3, 0.7),
  method = c("even", "top", "hyper", "hypo"),
  full_info = FALSE
)

Arguments

tumor_table

A matrix of beta-values of tumor samples.

control_table

A matrix of beta-values of control/normal samples.

auc

A data.frame with AUC scores generated by get_AUC.

ref_table

A data.frame with first two columns reporting genomic location (chromosome, genomic_coordinates).

cores

Number of parallel processes.

max_sites

Maximum number of sites to retrieve (half hyper-, half hypo-methylated) (default=20).

min_distance

Exclude sites located at less than 'min_distance' from higher-ranking site (default = 1e6 bps).

percentiles

A vector of length 2. Min and max percentiles to select sites with beta values outside hypo- and hyper-ranges (default = c(0,100); i.e. only min and max beta should be outside of ranges).

hyper_range

A vector of length 2 with minimum lower and upper values required to select hyper-methylated informative sites.

hypo_range

A vector of length 2 with minimum lower and upper values required to select hypo-methylated informative sites.

control_costraints

To select a site, "first quartile"/"third quartile" of control data must be above/below these beta-values.

method

How to select sites: "even" (half hyper-, half hypo-methylated sites), "top" (highest AUC irregardless of hyper or hypomethylation), "hyper" (hyper-methylated sites only), "hypo" (hypo-methylated, sites only).

full_info

Return all informative sites with a column reporting wether to use a site or not (for debugging purposes).

Details

A new parameter, named control_costraints, is required to force the selection of sites with upper/lower quartiles of control scores are below beta-values given by control_costraints. Sites are divided into hyper and hypo depending on their level of methylation with respect to the average beta-score of normal samples.

Value

A data.frame reporting probe names and type ("hyper" and "hypo") of informative sites.

Examples

## WARNING: The following code doesn't retrieve any informative site
## It just shows how to use the tool
auc_data <- get_AUC(tumor_toy_data, control_toy_data)
info_sites <- find_informative_sites(tumor_toy_data, control_toy_data, auc_data, illumina27k_hg19)

romagnolid/PAMES documentation built on Dec. 7, 2022, 10:37 a.m.