find_informative_regions: Select informative regions (extended)

View source: R/find_informative_regions.R

find_informative_regionsR Documentation

Select informative regions (extended)

Description

This function generates a list of informative regions to estimate the purity or the tumor content of a set of tumor samples.

Usage

find_informative_regions(
  tumor_table,
  control_table,
  auc,
  cores = 1,
  max_regions = 20,
  percentiles = c(0, 100),
  hyper_range = c(min = 0.4, max = 0.9),
  hypo_range = c(min = 0.1, max = 0.6),
  control_costraints = c(0.3, 0.7),
  method = c("even", "top", "hyper", "hypo"),
  full_info = FALSE
)

Arguments

tumor_table

A matrix of beta-values of tumor samples.

control_table

A matrix of beta-values of control/normal samples.

auc

A data.frame with AUC scores generated by get_AUC.

cores

Number of parallel processes.

max_regions

Maximum number of regions to retrieve (half hyper-, half hypo-methylated) (default=20).

percentiles

A vector of length 2. Min and max percentiles to select sites with beta values outside hypo- and hyper-ranges (default = c(0,100); i.e. only min and max beta should be outside of ranges).

hyper_range

A vector of length 2 with minimum lower and upper values required to select hyper-methylated informative sites.

hypo_range

A vector of length 2 with minimum lower and upper values required to select hypo-methylated informative sites.

control_costraints

To select a site, "first quartile"/"third quartile" of control data must be above/below these beta-values.

method

How to select sites: "even" (half hyper-, half hypo-methylated sites), "top" (highest AUC irregardless of hyper or hypomethylation), "hyper" (hyper-methylated sites only), "hypo" (hypo-methylated, sites only).

full_info

Return all informative sites (for debugging purposes).

Details

A new parameter, named control_costraints, is required force selection of sites where upper/lower quartiles of control scores are below beta-values given by control_costraints. Regions are divided into hyper and hypo depending on their level of methylation with respect to the average beta-score of normal samples.

Value

A data.frame reporing region names (chr_position) and type ("hyper" and "hypo") of informative regions.

Examples

reduced_data <- reduce_to_regions(bs_seq_toy_matrix, bs_seq_toy_sites, cpg_islands[1:1000,])
auc_data <- get_AUC(reduced_data[,1:10], reduced_data[,11:20])
info_regions <- find_informative_regions(reduced_data[,1:10], reduced_data[,11:20], auc_data)

romagnolid/PAMES documentation built on Dec. 7, 2022, 10:37 a.m.