Parsing output from the Bismark alignment suite.
1 2 3 4 5 6 7 8
Input files. Each sample is in a different file. Input files are created by running Bismark's methylation extractor; see Note for details.
sample names, based on the order of
Should methylation loci that have zero coverage in all samples be removed. This will result in a much smaller object if the data originates from (targeted) capture bisulfite sequencing.
Should strand-symmetric methylation loci, e.g., CpGs,
be collapsed across strands. This option is only available if
The format of the input file; see Note for details.
The number of cores used. Note that setting
Make the function verbose.
The backend used for the 'M' and 'Cov' matrices. The default,
An object of class
Input files can either be gzipped or not.
The user must specify the relevant file format via the
argument. The format of the output of the Bismark alignment suite will depend
on the version of Bismark and on various user-specified options. Please
consult the Bismark documentation and the Bismark RELEASE NOTES
for the definitive list of changes between versions. When possible, it is
strongly recommended that you use the most recent version of Bismark.
cov" and "
oldBedGraph" formats both have six columns
methylation percentage", "
count unmethylated"). If you are using a recent version of Bismark
v>=0.10.0) then the standard file extension for this file is
.cov". If, however, you are using an older version of Bismark
v<0.10.0) then the file extension will be "
note that the "
.bedGraph" file created in recent versions of Bismark
v>=0.10.0) is not suitable for analysis with bsseq because
it only contains the "
methylation percentage" and not
count methylated" nor "
cytosineReport" format has seven columns
count methylated", "
count unmethylated", "
There is no standard file extension for this file. The "
trinucleotide context" columns are not currently used by bsseq.
The following is a list of some issues to be aware of when using output from Bismark's methylation extractor:
The program to extract methylation counts was named
methylation_extractor in older versions of Bismark (
bismark_methylation_extractor in recent versions of
v>=0.8.0). Furthermore, very old versions of Bismark
v<0.7.7) required that user run a separate script (called
genome_methylation_bismark2bedGraph) to create the
--bedGraph arguments must be supplied
order to use the output with
The genomic co-ordinates of the Bismark output file may be zero-based
or one-based depending on whether the
--zero_based argument is used.
Furthermore, the default co-ordinate system varies by version of Bismark.
bsseq makes no assumptions about the basis of the genomic co-ordinates and
it is left to the user to ensure that the appropriate basis is used in the
analysis of their data. Since Bioconductor packages and
GRanges use one-based co-ordinates, it
is recommended that your Bismark files are also one-based.
Peter Hickey [email protected]
read.bsmooth for parsing output from the BSmooth
read.umtab for parsing legacy (old)
formats from the BSmooth alignment suite.
collapseBSseq for collapse (merging or summing) the
data in two different directories.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
infile <- system.file("extdata/test_data.fastq_bismark.bismark.cov.gz", package = 'bsseq') bismarkBSseq <- read.bismark(files = infile, sampleNames = "test_data", rmZeroCov = FALSE, strandCollapse = FALSE, fileType = "cov", verbose = TRUE) bismarkBSseq #----------------------------------------------------------------------------- # An example constructing a HDF5Array-backed BSseq object # library(HDF5Array) # See ?DelayedArray::setRealizationBackend for details hdf5_bismarkBSseq <- read.bismark(files = infile, sampleNames = "test_data", rmZeroCov = FALSE, strandCollapse = FALSE, fileType = "cov", verbose = TRUE, BACKEND = "HDF5Array")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.