The TSSi package normalizes and identifies transcription start sites in high-throughput sequencing data.
High throughput sequencing has become an essential experimental approach for the investigation of transcriptional mechanisms. For some applications like ChIP-seq, there are several available approaches for the prediction of peak locations. However, these methods are not designed for the identification of transcription start sites (TSS) because such data sets have qualitatively different noise.
The TSSi provides a heuristic framework for the identification of TSS based on high-throughput sequencing data. Probabilistic assumptions for the count distribution as well as for systematic errors, i.e. for contaminating measurements close to a TSS, are made and can be adapted by the user. The framework also comprises a regularization procedure which can be applied as a preprocessing step to decrease the noise and thereby reduce the number of false predictions.
The package is published under the GPL-3 license.
Clemens Kreutz, Julian Gehring, Jens Timmer
Maintainer: Julian Gehring <email@example.com>
C. Kreutz, J. Gehring, D. Lang, J. Timmer, and S. Rensing: TSSi - An R package for transcription start site identification from high throughput sequencing data.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
## load data set data(physcoCounts) ## segmentize data attach(physcoCounts) x <- segmentizeCounts(counts=counts, start=start, chr=chromosome, region=region, strand=strand) detach(physcoCounts) x segments(x) ## normalize data, w/o and w/ fitting yRatio <- normalizeCounts(x) yFit <- normalizeCounts(x, fit=TRUE) yFit ## identify TSS z <- identifyStartSites(yFit) z ## inspect results head(tss(z, 1)) plot(z, 1)