gpatterns.get_avg_meth | R Documentation |
Extracts average methylation data from tracks.
gpatterns.get_avg_meth(
tracks,
intervals,
iterator = NULL,
min_cov = NULL,
mask_by_cov = FALSE,
use_cpgs = FALSE,
min_samples = NULL,
min_cpgs = NULL,
min_var = NULL,
var_quantile = NULL,
min_range = NULL,
names = NULL,
tidy = TRUE,
pre_screen = FALSE,
use_disk = FALSE,
file = NULL,
intervals.set.out = NULL,
sum_tracks = FALSE
)
tracks |
methylation tracks |
intervals |
genomic scope for which the function is applied |
iterator |
see iterator in gextract. if NULL iterator would be set to CpGs |
min_cov |
minimal coverage for iterator interval |
mask_by_cov |
change loci with coverage < min_cov to NA. Not relevant when intervals.set.out or file is not NULL. |
use_cpgs |
use CpGs as iterator |
min_samples |
minimal number of samples with cov >= min_cov. if min_cov is NULL it would be set to 1. |
min_cpgs |
minimal number of CpGs per iterator interval. note that the intervalID column may be incorrect. |
min_var |
minimal variance (across samples) per iterator interval |
var_quantile |
minimal quantile of variance per iterator interval |
min_range |
take only iterator intervals with max(avg_meth) - min(avg_meth) >= |
names |
alternative names to tracks. similar to colnames in gextract if tidy == FALSE. Note that names should be shorter than the maximal length of R data frame column name |
tidy |
if TRUE returns a tidy data frame with the following fields: chrom, start, end, intervalID, samp, meth, unmeth, avg, cov. if FALSE returns a data frame with average methylation, similar to gextract'. Note that for a large number of intervals tidy == FALSE may be the only memory feasable option. |
file |
save output to file (only in non tidy mode, would not filer by variance) |
intervals.set.out |
save output big intervals set (only in tidy mode, would not filter by variance) |
sum_tracks |
get average methylation from all the tracks summed |
pre |
screen for min_samples and min_cov (for large number of tracks / large number of intervals). Note that the intervalID column may be incorrect and if use_cpgs is TRUE, the intervals set would become the cpgs. |
There are two main modes:
not tidy: returns a data frame with intervals (chrom,start,end) and a column with average methylation for each sample.
tidy: returns a tidy data frame with 'meth','unmeth','avg','cov' for each iterator interval for each sample.
the 'tidy' option is very conveniet in terms of further analysis, but note that for large amount of data it may be too slow. The 'not tidy' version, on the other hand, returns only average methylation and not the raw 'meth' and 'unmeth' calls. In general, choose the mode according to the following guidelines:
For extremly large datasets use the 'not tidy' version with use_disk == TRUE
. Note that in general working with huge number of genomic regions is not useful, both in terms of performance (memory consumption, slow algorithms) and analysis (more 'noise'). A good practice is to select the genomic regions carefully, for example by requering minimal coverage (min_cov
) in minimal number of samples (min_samples
), minimal number of CpGs (min_cpgs)
, taking only the most variable regions (min_var
, var_quantile
) or by taking sets of annotated regioins (e.g. promoters, enhancers).
For large datasets use the 'not tidy' version.
For intermediate size datasets use the 'tidy' version with pre_screen = TRUE
. This would first filter the CpGs and only then exracts the methylation to memory.
For small datasets use the 'vanilla' 'tidy' version.
To understand the concept of iterators and intervals, see gextract, and the misha
package in general.
The function works in the following way: for every interval in intervals
the function extracts the methylation calls in each iterator
interval
and calculates the average.
Beware the difference between intervals and
iterator: intervals parameter sets the global genomic scope of the function
(what part of the genome to look at to begin with).
iterator
parameter sets the iterator intervals, which are the chunks of the genome form which we will extract the methylation calls.
For example setting the iterator to gintervals.all() would calculate the average methylation of every chromosome, whereas setting the intervals to
gintervals.all() would just mean that the calculations of the iterator
intervals would not be limited to a specific part of the genome, and, for
example, if iterator=NULL, methylation would be extracted from all the
genomic CpGs.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.