BSFDataSet: BSFDataSet object and constructors

View source: R/AllClasses.R

BSFDataSetR Documentation

BSFDataSet object and constructors

Description

BSFDataSet contains the class GenomicRanges, which is used to store input ranges. Alongside with the iCLIP signal in list structure and additional meta data as data.frame.

Usage

BSFDataSet(ranges, meta, signal, dropSeqlevels = TRUE, silent = FALSE)

BSFDataSet(ranges, meta, signal, dropSeqlevels = TRUE, silent = FALSE)

BSFDataSetFromBigWig(ranges, meta, silent = FALSE, dropSeqlevels = TRUE)

Arguments

ranges

a GenomicRanges with the desired ranges to process. The strand slot must be either + or -.

meta

a data.frame with at least two columns. The first column should be a unique numeric id. The second column holds sample type information, such as the condition.

signal

a list with the two entries 'signalPlus' and 'signalMinus', following a special representation of SimpleRleList for counts per replicates (see details for more information).

dropSeqlevels

enforce seqnames to be the same in ranges and signal, by dropping unused seqlevels which is required for most downstream functions such as coverageOverRanges

silent

suppress messages but not warnings (TRUE/ FALSE)

Details

The ranges are enforced to have to have a "+" or "-" strand annotation,"*" is not allowed. They are expected to be of the same width and a warning is thrown otherwise.

The meta information is stored as data.frame with at least two required columns, 'id' and 'condition'. They are used to build the unique identifier for each replicate split by '_' (eg. id = 1 and condition = WT will result in 1_WT).

The meta data needs to have the additional columns 'clPlus' and 'clMinus' to be present if BSFDataSetFromBigWig is called. It is used to provide the location to the iCLIP coverage files to the import function. On object initialization these files are loaded and internally represented in the signal slot of the object (see BSFDataSet).

The iCLIP signal is stored in a special list structure. At the lowest level crosslink counts per nucleotide are stored as Rle per chromosome summarized as a SimpleRleList. Such a list exits for each replicate and must be named by the replicate identifier (eg. 1_WT). Therefore this list contains always exactly the same number of entries as the number of replicates in the dataset. Since we handle strands initially seperated from each other this list must be given twice, once for each strand. The strand specific entries must be named 'signalPlus' and 'signalMinus'.

The option dropSeqlevels forces the seqnames of the ranges and the signal to be the same. If for a specific chromosome in the ranges no respective entry in the signal list can be found, then entries with that chromosome are dropped This behavior is needed to keep the BSFDataSet object in sync, which is required for downstream functions such as coverageOverRanges

Value

A BSFDataSet object.

Examples


# load data
files <- system.file("extdata", package="BindingSiteFinder")
load(list.files(files, pattern = ".rda$", full.names = TRUE))
rng = getRanges(bds)
sgn = getSignal(bds)
mta = getMeta(bds)
bdsNew = BSFDataSet(ranges = rng, signal = sgn, meta = mta)


ZarnackGroup/BindingSiteFinder documentation built on Nov. 24, 2024, 10:41 a.m.