read.bismark: Parsing output from the Bismark alignment suite.

Description Usage Arguments Value Note Author(s) See Also Examples

View source: R/read.bismark.R

Description

Parsing output from the Bismark alignment suite.

Usage

1
2
3
4
5
6
7
8
read.bismark(files,
             sampleNames,
             rmZeroCov = FALSE,
             strandCollapse = TRUE,
             fileType = c("cov", "oldBedGraph", "cytosineReport"),
             mc.cores = 1,
             verbose = TRUE,
             BACKEND = NULL)

Arguments

files

Input files. Each sample is in a different file. Input files are created by running Bismark's methylation extractor; see Note for details.

sampleNames

sample names, based on the order of files.

rmZeroCov

Should methylation loci that have zero coverage in all samples be removed. This will result in a much smaller object if the data originates from (targeted) capture bisulfite sequencing.

strandCollapse

Should strand-symmetric methylation loci, e.g., CpGs, be collapsed across strands. This option is only available if fileType = "cytosineReport" since the other file types do not contain the necessary strand information.

fileType

The format of the input file; see Note for details.

mc.cores

The number of cores used. Note that setting mc.cores to a value greater than 1 is not supported on MS Windows, see the help page for mclapply.

verbose

Make the function verbose.

BACKEND

The backend used for the 'M' and 'Cov' matrices. The default, NULL, corresponds to using matrix objects. See ?DelayedArray::setRealizationBackend for alternative backends.

Value

An object of class BSseq.

Note

Input files can either be gzipped or not.

The user must specify the relevant file format via the fileType argument. The format of the output of the Bismark alignment suite will depend on the version of Bismark and on various user-specified options. Please consult the Bismark documentation and the Bismark RELEASE NOTES (http://www.bioinformatics.bbsrc.ac.uk/projects/bismark/RELEASE_NOTES.txt) for the definitive list of changes between versions. When possible, it is strongly recommended that you use the most recent version of Bismark.

The "cov" and "oldBedGraph" formats both have six columns ("chromosome", "position", "strand", "methylation percentage", "count methylated", "count unmethylated"). If you are using a recent version of Bismark (v>=0.10.0) then the standard file extension for this file is ".cov". If, however, you are using an older version of Bismark (v<0.10.0) then the file extension will be ".bedGraph". Please note that the ".bedGraph" file created in recent versions of Bismark (v>=0.10.0) is not suitable for analysis with bsseq because it only contains the "methylation percentage" and not "count methylated" nor "count unmethylated".

The "cytosineReport" format has seven columns ("chromosome", "position", "strand", "count methylated", "count unmethylated", "C-context", "trinucleotide context"). There is no standard file extension for this file. The "C-context" and "trinucleotide context" columns are not currently used by bsseq.

The following is a list of some issues to be aware of when using output from Bismark's methylation extractor:

Author(s)

Peter Hickey [email protected]

See Also

read.bsmooth for parsing output from the BSmooth alignment suite. read.umtab for parsing legacy (old) formats from the BSmooth alignment suite. collapseBSseq for collapse (merging or summing) the data in two different directories.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
  infile <- system.file("extdata/test_data.fastq_bismark.bismark.cov.gz",
                        package = 'bsseq')
  bismarkBSseq <- read.bismark(files = infile,
                               sampleNames = "test_data",
                               rmZeroCov = FALSE,
                               strandCollapse = FALSE,
                               fileType = "cov",
                               verbose = TRUE)
  bismarkBSseq

  #-----------------------------------------------------------------------------
  # An example constructing a HDF5Array-backed BSseq object
  #
  library(HDF5Array)
  # See ?DelayedArray::setRealizationBackend for details
  hdf5_bismarkBSseq <- read.bismark(files = infile,
                                    sampleNames = "test_data",
                                    rmZeroCov = FALSE,
                                    strandCollapse = FALSE,
                                    fileType = "cov",
                                    verbose = TRUE,
                                    BACKEND = "HDF5Array")

Bioconductor-mirror/bsseq documentation built on July 28, 2017, 5:20 a.m.