Description Usage Arguments Details Value Realization backends Parallelization and progress monitoring Author(s) References See Also Examples
This implements the BSmooth algorithm for estimating methylation levels from bisulfite sequencing data.
1 2 3 4 5 6 7 8 9 |
BSseq |
An object of class |
ns |
The minimum number of methylation loci in a smoothing window. |
h |
The minimum smoothing window, in bases. |
maxGap |
The maximum gap between two methylation loci, before the smoothing is broken across the gap. The default smoothes each chromosome separately. |
keep.se |
Should the estimated standard errors from the smoothing algorithm be kept. This will make the return object roughly 30 percent bigger and is currently not be used for anything in bsseq. |
BPPARAM |
An optional BiocParallelParam instance determining the parallel back-end to be used during evaluation. Currently supported are SerialParam (Unix, Mac, Windows), MulticoreParam (Unix and Mac), SnowParam (Unix, Mac, and Windows, limited to single-machine clusters), and BatchJobsParam (Unix, Mac, Windows, only with the in-memory realization backend). See sections 'Parallelization and progress monitoring' and 'Realization backends' for further details. |
chunkdim |
Only applicable if |
level |
Only applicable if |
verbose |
A |
ns
and h
are passed to the locfit
function. The
bandwidth used is the maximum (in genomic distance) of the h
and a width big enough to contain ns
number of methylation
loci.
An object of class BSseq
, containing coefficients used to fit smoothed
methylation values and optionally standard errors for these.
The BSmooth()
function creates a new assay to store the coefficients
used to construct the smoothed methylation estimates ((coef
). An
additional assay is also created if keep.se == TRUE
(se.coef
).
The choice of realization backend controls whether these assay(s) are stored in-memory as an ordinary matrix or on-disk as a HDF5Array, for example.
The choice of realization backend is controlled by the BACKEND
argument, which defaults to the current value of DelayedArray::getAutoRealizationBackend()
.
BSmooth
supports the following realization backends:
NULL
(in-memory): This stores each new assay in-memory using
an ordinary matrix.
HDF5Array
(on-disk): This stores each new assay on-disk in a
HDF5 file using an HDF5Matrix from HDF5Array.
Please note that certain combinations of realization backend and
parallelization backend are currently not supported. For example, the
HDF5Array realization backend is currently only compatible when
used with a single-machine parallelization backend (i.e. it is not compatible
with a SnowParam that specifies an ad hoc cluster of
multiple machines). BSmooth()
will issue an error when given
such incompatible realization and parallelization backends. Furthermore, to
avoid memory usage blow-ups, BSmooth()
will issue an error if an
in-memory realization backend is used when smoothing a disk-backed
BSseq object.
Additional arguments related to the realization backend can be passed via the
...
argument. These arguments must be named and are passed to the
relevant RealizationSink constructor. For example, the
...
argument can be used to specify the path to the HDF5 file to be
used by BSmooth()
. Please see the examples at the bottom of the page.
BSmooth()
now uses the BiocParallel package to implement
parallelization. This brings some notable improvements:
Smoothed results can now be written directly to an on-disk realization backend by the worker. This dramatically reduces memory usage compared to previous versions of bsseq that required all results be retained in-memory.
Parallelization is now supported on Windows through the use of a
SnowParam object as the value of BPPARAM
.
Detailed and extensive job logging facilities.
All parallelization options are controlled via the BPPARAM
argument.
In general, we recommend that users combine multicore (single-machine)
parallelization with an on-disk realization backend (see section,
'Realization backend'). For Unix and Mac users, this means using
a MulticoreParam. For Windows users, this means using a
single-machine SnowParam. Please consult the BiocParallel
documentation to take full advantage of the more advanced features.
parallelBy
, mc.cores
, and mc.preschedule
are
deprecated and will be removed in subsequent releases of bsseq. These
arguments were necessary when BSmooth()
used the parallel
package to implement parallelization, but this functionality is superseded
by the aforementioned use of BiocParallel. We recommend that users
who previously relied on these arguments switch to
BPPARAM = MulticoreParam(workers = mc.cores, progressbar = TRUE)
.
A useful feature of BiocParallel are progress bars to monitor the
status of long-running jobs, such as BSmooth()
. Progress bars are
controlled via the progressbar
argument in the
BiocParallelParam constructor. Progress bars replace the
use of the deprecated verbose
argument to print out information on
the status of BSmooth()
.
BiocParallel also supports extensive and detailed logging facilities. Please consult the BiocParallel documentation to take full advantage these advanced features.
Method and original implementation by Kasper Daniel Hansen khansen@jhsph.edu. Updated implementation to support disk-backed BSseq objects and more general parallelization by Peter Francis Hickey.
KD Hansen, B Langmead, and RA Irizarry. BSmooth: from whole genome bisulfite sequencing reads to differentially methylated regions. Genome Biology (2012) 13:R83. doi:10.1186/gb-2012-13-10-r83.
locfit
in the locfit package, as well as
BSseq
.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 | ## Not run:
# Run BSmooth() on a matrix-backed BSseq object using an in-memory realization
# backend with serial evaluation.
data(BS.chr22)
# This is a matrix-backed BSseq object.
sapply(assays(BS.chr22, withDimnames = FALSE), class)
BS.fit <- BSmooth(BS.chr22, BPPARAM = SerialParam(progressbar = TRUE))
# The new 'coef' assay is an ordinary matrix.
sapply(assays(BS.fit, withDimnames = FALSE), class)
BS.fit
# Run BSmooth() on a disk-backed BSseq object using the HDF5Array realization
# backend (with data written to the file 'BSmooth_example.h5') with
# multi-core parallel evaluation.
BS.chr22 <- realize(BS.chr22, "HDF5Array")
# This is a disk-backed BSseq object.
sapply(assays(BS.chr22, withDimnames = FALSE), class)
BS.fit <- BSmooth(BS.chr22,
BPPARAM = MulticoreParam(workers = 2, progressbar = TRUE),
BACKEND = "HDF5Array",
filepath = "BSmooth_example.h5")
# The new 'coef' assay is an HDF5Matrix.
sapply(assays(BS.fit, withDimnames = FALSE), class)
BS.fit
# The new 'coef' assay is in the HDF5 file 'BSmooth_example.h5' (in the
# current working directory).
sapply(assays(BS.fit, withDimnames = FALSE), path)
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.