preprocess_bs_seq: Pre-process BS-Seq data in any given format
In andreaskapou/BPRMeth-devel: Model higher-order methylation profiles

Description Usage Arguments Value Additional Info Author(s) See Also Examples

preprocess_bs_seq is a general function for reading and preprocessing BS-Seq data. If a vector of files is given, these are considered as replicates and are pooled together. Finally, noisy reads are discarded.

1 2	preprocess_bs_seq(files, file_format = "encode_rrbs", chr_discarded = NULL, min_bs_cov = 4, max_bs_cov = 1000)

`files`	A vector of filenames containing replicate experiments. This can also be just a single replicate.
`file_format`	A string denoting the file format that the BS-Seq data are stored. Current version allows "`encode_rrbs`" or "`bismark_cov`" formats.
`chr_discarded`	A vector with chromosome names to be discarded.
`min_bs_cov`	The minimum number of reads mapping to each CpG site. CpGs with less reads will be considered as noise and will be discarded.
`max_bs_cov`	The maximum number of reads mapping to each CpG site. CpGs with more reads will be considered as noise and will be discarded.

A GRanges object. The GRanges object contains two additional metadata columns:

total_reads: total reads mapped to each genomic location.
meth_reads: methylated reads mapped to each genomic location.

These columns can be accessed as follows: granges_object$total_reads

Information about the file formats can be found in the following links:

Encode RRBS format: http://rohsdb.cmb.usc.edu/GBshape/cgi-bin/hgTables?db= hg19&hgta_group=regulation&hgta_track=wgEncodeHaibMethylRrbs&hgta_table= wgEncodeHaibMethylRrbsBcbreast0203015BiochainSitesRep2&hgta_doSchema= describe+table+schema

Bismark Cov format: http://rnbeads.mpi-inf.mpg.de/data/RnBeads.pdf

C.A.Kapourani C.A.Kapourani@ed.ac.uk

read_bs_bismark_cov, read_bs_encode_haib pool_bs_seq_rep

# Obtain the path to the files
bs_file <- system.file("extdata", "rrbs.bed", package = "BPRMeth")
bs_data <- preprocess_bs_seq(bs_file, file_format = "encode_rrbs")

# Extract the total reads and methylated reads
total_reads <- bs_data$total_reads
meth_reads <- bs_data$meth_reads