read.bedMethyl: Parsing bedMethyl output from modkit pileup.

read.bedMethylR Documentation

Parsing bedMethyl output from modkit pileup.

Description

Parsing bedMethyl output from modkit pileup.

Usage

read.bedMethyl(files,
               loci = NULL,
               colData = NULL,
               rmZeroCov = TRUE,
               strandCollapse = TRUE,
               BPPARAM = bpparam(),
               BACKEND = NULL,
               dir = tempfile("BSseq"),
               replace = FALSE,
               chunkdim = NULL,
               level = NULL,
               nThread = 1L,
               verbose = getOption("verbose"))

Arguments

files

The path to the files created by running modkit pileup, one sample per file. See the methods section of [link to preprint] for validated output.

loci

NULL (default) or a GenomicRanges instance containing methylation loci (all with width equal to 1). If loci = NULL, then read.bedMethyl() will perform a first pass over the bedMethyl files to identify candidate loci. If loci is a GenomicRanges instance, then these form the candidate loci. The candidate loci will be collapsed if strandCollapse = TRUE.

colData

An optional DataFrame describing the samples. Row names, if present, become the column names of the BSseq object. If NULL, then a DataFrame will be created with files used as the row names.

rmZeroCov

A logical(1) indicating whether methylation loci that have zero coverage in all samples should be removed. Default setting is rmZeroCov = TRUE

strandCollapse

A logical(1) indicating whether strand-symmetric methylation loci (i.e. CpGs) should be collapsed across strands.

BPPARAM

An optional BiocParallelParam instance determining the parallel back-end to be used during evaluation.

BACKEND

NULL or a single string specifying the name of the realization backend. Currently, the backend is not supported for downstream applications.

dir

Only applicable if BACKEND == "HDF5Array". The path (as a single string) to the directory where to save the HDF5-based BSseq object.

replace

Only applicable if BACKEND == "HDF5Array". If the directory dir already exists, should it be replaced with a new one?

chunkdim

Only applicable if BACKEND == "HDF5Array". The dimensions of the chunks to use for writing the data to disk.

level

The compression level to use for writing the data to disk.

nThread

The number of threads used by fread when reading the files.

verbose

A logical(1) indicating whether progress messages should be printed (default TRUE).

File formats

The format of each file should be similar to the examples in [link to preprint]. Files ending in .gz, .bz2, .xz, or .zip will be automatically decompressed to tempdir().

Supported file formats

Modkit bedMethyl files from modkit pileup. For downstream likelihood functions we recommend running modkit pileup on output from bam files modification/basecalled using a CG context model and not using a reference genome for pileup.

Unsupported file formats

Other types of output.

One-based vs. zero-based genomic co-ordinates

The genomic co-ordinates of bedMethyl files are zero-based. Since Bioconductor packages typically use one-based co-ordinates, the co-ordinates from the bedMethyl files are converted to one-based in the BSseq object.

Author(s)

Søren Blikdal Hansen (soren.blikdal.hansen@sund.ku.dk)

Examples

# Example: Reading bedMethyl files included in the bsseq package
# Paths to example bedMethyl files in the package's extdata directory
infiles <- c(system.file("extdata/HG002_nanopore_test.bedMethyl.gz",
                         package = "bsseq"),
             system.file("extdata/HG002_pacbio_test.bedMethyl.gz",
                         package = "bsseq"))

# Run the function to import data
bsseq <- read.bedMethyl(files = infiles,
                        colData = DataFrame(row.names = c("test_nanopore", 
                                                          "test_pacbio")),
                        rmZeroCov = FALSE,
                        strandCollapse = TRUE,
                        verbose = TRUE)

# View the resulting BSseq object
bsseq

hansenlab/bsseq documentation built on June 12, 2025, 7:42 p.m.