epigraHMMDataSetFromBam: Create a epigraHMMDataSet from a set of BAM files
In plbaldoni/epigrahmm: Epigenomic R-based analysis with hidden Markov models

View source: R/epigraHMMDataSetFromBam.R

epigraHMMDataSetFromBam

R Documentation

Create a epigraHMMDataSet from a set of BAM files

Description

This function creates a RangedSummarizedExperiment object from of a set of BAM files. It is used to store the input data, the model offsets, and the results from the peak calling algorithms.

Usage

epigraHMMDataSetFromBam(
  bamFiles,
  colData,
  genome,
  windowSize,
  gapTrack = TRUE,
  blackList = TRUE
)

Arguments

`bamFiles`	a string vector (or a list of string vectors) with the path for BAM files. If bamFiles is a list of string vectors, vectors must be named, have the same dimension, and, at least, a vector with name 'counts' must exist (see details).
`colData`	a `data.frame` with the experimental data. It must contain the columns `condition` and `replicate`. `condition` refers to the experimental condition identifier (e.g. cell line name). `replicate` refers to the replicate identification number (unique for each condition).
`genome`	either a single string with the name of the reference genome (e.g. 'hg19') or a GRanges object with ranges to be tilled into a set of non-overlapping windows.
`windowSize`	an integer specifying the size of genomic windows where read counts will be computed.
`gapTrack`	either a logical (`TRUE`, the default, or `FALSE`) or a GRanges object with gap regions of the genome to be excluded. If `TRUE`, the function will discard genomic coordinates overlapping regions present in the UCSC gap table of the respective reference genome (if available). See Details section below.
`blackList`	either a logical (`TRUE`, the default, or `FALSE`) or a GRanges object with blacklisted regions of the genome to be excluded. If `TRUE`, the function will discard ENCODE blacklisted regions from selected reference genomes (if available). See Details section below.

Details

The index ".bai" files must be stored in the same directory of their respective BAM files. The index files must be named after their respective BAM files with the additional ".bai" suffix.

‘epigraHMMDataSetFromBam' will store experimental data (e.g. ChIP-seq counts) from bamFiles (or bamFiles[[’counts']], if a list is provided). Additional data (e.g. input control counts) will be stored similarly with their respective list names.

By default, the function computes read counts using csaw's estimated fragment length via cross correlation analysis. For experimental counts (e.g. ChIP-seq), sequencing reads are shifted downstream half of the estimated fragment length. For additional counts (e.g. input control), sequencing reads are not shifted prior to counting.

Additional columns included in the colData input will be passed to the resulting epigraHMMDataSet assay and can be acessed via colData() function.

The genome argument will call GenomeInfoDb::Seqinfo() to fetch the chromosome lengths of the specified genome. See ?GenomeInfoDb::Seqinfo for the list of UCSC genomes that are currently supported.

If gapTrack = TRUE and the name of a reference genome is passed as input through genome (e.g. 'hg19'), the function will discard any genomic coordinate overlapping regions specified by the respective UCSC gap table. If gapTrack is a GRanges object, the function will discard any genomic coordinate overlaping regions from gapTrack.

If blackList = TRUE and the name of a reference genome is passed as input through genome (e.g. 'hg19'), The function will fetch the manually curated blacklist tracks (Version 2) from https://github.com/Boyle-Lab/Blacklist/tree/master/lists. Current available genomes are ce10, dm3, hg19, hg38, and mm10. If blackList is a GRanges object, the function will discard any genomic coordinate overlaping regions from blackList.

Value

An epigraHMMDataSet object with sorted colData regarding conditions and replicates. Experimental counts will be stored in the 'counts' assay in the resulting epigraHMMDataSet object. Additional experimental data will be stored with their respective names from the list bamFiles.

Author(s)

Pedro L. Baldoni, pedrobaldoni@gmail.com

References

https://github.com/plbaldoni/epigraHMM DOI: 10.1093/nar/gkv1191 DOI: 10.1038/s41598-019-45839-z DOI: 10.1038/nature11247

Examples

bamFiles <- system.file("extdata","euratrans",
                        "lv-H3K27me3-SHR-male-bio2-tech1.bam",
                        package="chromstaRData")
                        
colData <- data.frame(condition = 'SHR', replicate = 1)

object <- epigraHMMDataSetFromBam(bamFiles = bamFiles,
                                  colData = colData,
                                  genome = 'rn4',
                                  windowSize = 25000,
                                  gapTrack = TRUE,
                                  blackList = TRUE)

plbaldoni/epigrahmm documentation built on Oct. 14, 2023, 5:13 a.m.