constructCDS

Share:

Description

The function constructs an object of class ChipDataSet, which is a container for holding processed sequencing data and the results of all downstream analyses. All the slots of the created object are filled during the workflow by applying specific functions to the object directly.

Usage

1
2
3
constructCDS(peaks, reads, region, TxDb, tssOf = c("gene", "transcript"),
  tss.region = c(-2000, 2000), reduce.peaks = FALSE, gapwidth = 1000,
  fragment.size, unique = TRUE, swap.strand = FALSE, param = NULL)

Arguments

peaks

A path to a file with peaks. The file needs to have at least 3 columns (tab-separated): chromosome, start (peak), end (peak). The 4th column - name (peak id) is optional.

reads

A path to a BAM file with sequencing reads.

region

GRanges. Genomic region(s) to extract reads from. If not supplied, all the reads from a BAM file are extracted.

TxDb

TxDb object.

tssOf

Character. Extract Transcription Start Site (TSS) regions from either "gene" or "transcript" annotations. Default: "gene".

tss.region

A numeric vector of length two, which specifies the size of TSS region. Default: -2kb to 2kb.

reduce.peaks

Logical. Whether to merge neighboring peaks. Default: FALSE.

gapwidth

Numeric. A minimum distance (in bp) between peaks to merge. Default: 1000.

fragment.size

Numeric. Extend read length to the fragment size.

unique

Logical. Whether to remove duplicated reads (based on the genomic coordinates). Default: FALSE.

swap.strand

Logical. Whether to reverse the strand of the read. Default: FALSE.

param

ScanBamParam object influencing what fields and which records (reads) are imported from the Bam file. Default: NULL.

Details

The function constructCDS initializes a ChipDataSet object, by providing the paths to the input files and information relevant to the ChIP-seq library preparation procedure. During the object construction the following steps are executed:

  • The peak information is converted into the object of GRanges class.

  • The genomic distribution of the peaks is evaluated (exonic, intronic, intergenic, TSSs).

  • Each peak in the data set is functionally characterized:

    • length - the length of a peak (in base pairs).

    • fragments - total number of fragments overlapping a peak region.

    • density - number of fragments per base pair of the peak length.

    • pileup - highest fragment pileup in each peak region.

    • tssOverlap - overlap (binary, yes/no) of the peak with the annotated TSS region.

    The estimated features are used to predict which of the peaks are gene associated in the analysis downstream.

As many peak-calling algorithms tend to divide broader peaks into the several narrower closely spaced peaks, it is advised to merge these end-to-end peaks to decrease the number of false positives and prevent unnecessary truncation of transcripts in the downstream analysis.

Value

An object of class ChipDataSet.

Author(s)

Armen R. Karapetyan

See Also

ChipDataSet predictTssOverlap

Examples

1
2
3
4
5
### Load ChipDataSet object
data(cds)

### View a short summary of the object
cds