qMeth: Quantify DNA methylation

View source: R/qMeth.R

qMethR Documentation

Quantify DNA methylation

Description

Quantify methylation of cytosines from bisulfite sequencing data.

Usage

qMeth(
  proj,
  query = NULL,
  reportLevel = c("C", "alignment"),
  mode = c("CpGcomb", "CpG", "allC", "var"),
  collapseBySample = TRUE,
  collapseByQueryRegion = FALSE,
  asGRanges = TRUE,
  mask = NULL,
  reference = "genome",
  keepZero = TRUE,
  mapqMin = 0L,
  mapqMax = 255L,
  clObj = NULL
)

Arguments

proj

A qProject object from a bisulfite sequencing experiment

query

A GRanges object with the regions to be quantified. If NULL, all available target sequences (e.g. the whole genome) will be analyzed. Available target sequences are extracted from the header of the first bam file.

reportLevel

Report results combined for C's (reportLevel=“C”), the default) or individually for single alignments (reportLevel=“alignment”). The latter imposes further restrictions on some arguments (see ‘Details’).

mode

Cytosine quantification mode, one of:

CpGcomb

: only C's in CpG context (strands combined)

CpG

: only C's in CpG context (strands separate)

allC

: all C's (strands separate)

var

: variant detection (all C's, strands separate)

CpGcomb is the default.

collapseBySample

If TRUE, combine (sum) counts from bamfiles with the same sample name.

collapseByQueryRegion

If TRUE, combine (sum) counts for all cytosines contained in the same query region.

asGRanges

If TRUE, return results as a GRanges object; if FALSE, the results are returned as a data.frame.

mask

An optional GRanges object with genomic regions to be masked, i.e. excluded from the analysis (e.g. unmappable regions).

reference

Source of bam files; can be either “genome” (then the alignments against the genome are used) or the name of an auxiliary target sequence (then alignments against this target sequence will be used). The auxiliary name must correspond to the name contained in the auxiliary file refered by the auxiliaryFile argument of qAlign.

keepZero

If FALSE, only cytosines covered by at least one alignment will be returned; keepZero must be TRUE if multiple samples have the same sample name and collapseBySample is TRUE.

mapqMin

Minimal mapping quality of alignments to be included when counting (mapping quality must be greater than or equal to mapqMin). Valid values are between 0 and 255. The default (0) will include all alignments.

mapqMax

Maximal mapping quality of alignments to be included when counting (mapping quality must be less than or equal to mapqMax). Valid values are between 0 and 255. The default (255) will include all alignments.

clObj

A cluster object to be used for parallel processing of multiple files (see ‘Details’).

Details

qMeth can be used on a qProject object from a bisulfite sequencing experiment (sequencing of bisulfite-converted DNA), such as the one returned by qAlign when its parameter bisulfite is set to a different value than “no”.

qMeth quantifies DNA methylation by counting total and methylated events for individual cytosines, using the alignments that have been generated in converted (three-letter) sequence space for example by qAlign. A methylated event corresponds to a C/C match in the alignment, an unmethylated event to a T/C mismatch (or G/G matches and A/G mismatches on the opposite strand). For paired-end samples, the part of the left fragment alignment that overlaps with the right fragment alignment is ignored, preventing the use of redundant information coming from the same molecule.

Both directed (bisulfite=“dir”) and undirected (bisulfite=“undir”) experimental protocols are supported by qAlign and qMeth.

By default, results are returned per C nucleotide. If reportLevel=“alignment”, results are reported separately for individual alignments. In that case, query has to be a GRanges object with exactly one region, mode has to be either “CpG” or “allC”, the arguments collapseByQueryRegion, asGRanges, mask and keepZero have no effect and allele-specific projects are treated in the same way as normal (non-allele specific) projects.

Using the parameter mode, quantification can be limited to cytosines in CpG context, and counts obtained for the two cytosines on opposite strands within a single CpG can be combined (summed).

The quantification of methylation for all cytosines in the query region(s) (mode=“allC”) should be done with care, especially for large query regions, as the return value may require a large amount of memory.

If mode is set to “var”, qMeth only counts reads from the strand opposite of the cytosine and reports total and matching alignments. For a position identical to the reference sequence, only matches (and very few sequencing errors) are expected, independent on the methylation state of the cytosine. A reduced fraction of alignments matching the reference are indicative of sequence variations in the sequenced sample.

mapqMin and mapqMax allow to select alignments based on their mapping qualities. mapqMin and mapqMax can take integer values between 0 and 255 and equal to -10 log_{10} Pr(\textnormal{mapping position is wrong}), rounded to the nearest integer. A value 255 indicates that the mapping quality is not available.

If an object that inherits from class cluster is provided to the clObj argument, for example an object returned by makeCluster from package parallel, the quantification task is split into multiple chunks and processed in parallel using clusterApplyLB from package parallel. Not all tasks will be efficiently parallelized: For example, a single query region and a single (group of) bam files will not be split into multiple chunks.

Value

For reportLevel=“C”, a GRanges object if asGRanges=TRUE, otherwise a data.frame.

Each row contains the coordinates of individual cytosines for collapseByQueryRegion=FALSE or query regions for collapseByQueryRegion=TRUE.

In addition to the coordinates columns (or seqnames, ranges and strand slots for GRanges objects), each row contains per bam file:

Two values (total and methylated events, with suffixes _T and _M), or if the qProject object was created including a SNP table, six values (total and methylated events for Reference, Unknown and Alternative genotypes, with suffixed _TR, _TU, _TA, _MR, _MU and _MA). In the latter case, C's or CpG's that overlap with SNPs in the table are removed.

If collapseBySample=TRUE, groups of bam files with identical sample name are combined (summed) and will be represented by a single set of total and methylated count columns.

If mode=“var”, the _T and _M columns correspond to total and matching alignments overlapping the guanine paired to the cytosine.

For reportLevel=“alignment”, a list with one element per bam file or sample (depending on collapseBySample). Each list element is another list with the elements:

aid

: character vector with unique alignment identifiers

Cid

: integer vector with genomic coordinate of C base

strand

: character vector with the strand of the C base

meth

: integer vector with methylation state for alignment and C defined by aid and Cid. The values are 1 for methylated or 0 for unmethylated states.

Author(s)

Anita Lerch, Dimos Gaidatzis and Michael Stadler

See Also

qAlign, makeCluster from package parallel

Examples

# copy example data to current working directory
file.copy(system.file(package="QuasR", "extdata"), ".", recursive=TRUE)

# create alignments
sampleFile <- "extdata/samples_bis_single.txt"
genomeFile <- "extdata/hg19sub.fa"
proj <- qAlign(sampleFile, genomeFile, bisulfite="dir")
proj

# calculate methylation states
meth <- qMeth(proj, mode="CpGcomb")
meth


fmicompbio/QuasR documentation built on Dec. 11, 2024, 11:22 p.m.