View source: R/find_informative_sites.R
find_informative_sites | R Documentation |
This function generates a set of informative CpG sites to estimate the purity or the tumor content of a set of tumor samples.
find_informative_sites( tumor_table, control_table, auc, ref_table, cores = 1, max_sites = 20, min_distance = 1000000, percentiles = c(0, 100), hyper_range = c(min = 0.4, max = 0.9), hypo_range = c(min = 0.1, max = 0.6), control_costraints = c(0.3, 0.7), method = c("even", "top", "hyper", "hypo"), full_info = FALSE )
tumor_table |
A matrix of beta-values of tumor samples. |
control_table |
A matrix of beta-values of control/normal samples. |
auc |
A data.frame with AUC scores generated by |
ref_table |
A data.frame with first two columns reporting genomic location (chromosome, genomic_coordinates). |
cores |
Number of parallel processes. |
max_sites |
Maximum number of sites to retrieve (half hyper-, half hypo-methylated) (default=20). |
min_distance |
Exclude sites located at less than 'min_distance' from higher-ranking site (default = 1e6 bps). |
percentiles |
A vector of length 2. Min and max percentiles to select sites with beta values outside hypo- and hyper-ranges (default = c(0,100); i.e. only min and max beta should be outside of ranges). |
hyper_range |
A vector of length 2 with minimum lower and upper values required to select hyper-methylated informative sites. |
hypo_range |
A vector of length 2 with minimum lower and upper values required to select hypo-methylated informative sites. |
control_costraints |
To select a site, "first quartile"/"third quartile" of control data must be above/below these beta-values. |
method |
How to select sites: "even" (half hyper-, half hypo-methylated sites), "top" (highest AUC irregardless of hyper or hypomethylation), "hyper" (hyper-methylated sites only), "hypo" (hypo-methylated, sites only). |
full_info |
Return all informative sites with a column reporting wether to use a site or not (for debugging purposes). |
A new parameter, named control_costraints
, is required to force the
selection of sites with upper/lower quartiles of control scores are below
beta-values given by control_costraints
. Sites are divided into
hyper
and hypo
depending on their level of methylation with
respect to the average beta-score of normal samples.
A data.frame reporting probe names and type ("hyper" and "hypo") of informative sites.
## WARNING: The following code doesn't retrieve any informative site ## It just shows how to use the tool auc_data <- get_AUC(tumor_toy_data, control_toy_data) info_sites <- find_informative_sites(tumor_toy_data, control_toy_data, auc_data, illumina27k_hg19)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.