TSRchitect is an R package for analyzing diverse types of high-throughput transcription start site (TSS) profiling datasets. TSRchitect can handle TSS profiling experiments that contain either single-end or paired-end sequence reads, such as CAGE, RAMPAGE, PEAT, STRIPE-seq and others. TSRchitect is an open-source bioinformatics package that is intended to support large-scale, reproducible analysis of TSS profiling data in a broad array of eukaryotic model systems.
Before we can begin, we must first load TSRchitect in our working environment.
The TSRchitect User's Guide is available; it goes through several well-documented examples of TSRchitect using different datasets.
To open the TSRchitect User's Guide on your machine, enter the following:
Now that you have loaded TSRchitect, we can proceed with a small example.
First, we will create a tssObject using
loadTSSobj and import our sample .bam files (which are found in
In doing this, we provide sample names and identify our replicate names using the argument
We do this in the following manner:
extdata.dir <- system.file("extdata/bamFiles", package="TSRchitect") tssObjectExample <- loadTSSobj(experimentTitle="Vignette Example", inputDir=extdata.dir, n.cores=1, isPairedBAM=TRUE, sampleNames=c("sample1-rep1", "sample1-rep2","sample2-rep1", "sample2-rep2"), replicateIDs=c(1,1,2,2)) #datasets 1-2 and 3-4 are replicates
Please note that TSRchitect allows input also from bed-formatted files. You could replace the above lines with following equivalents:
extdata.dir <- system.file("extdata/bedFiles", package="TSRchitect") tssObjectExample <- loadTSSobj(experimentTitle="Vignette Example", inputDir=extdata.dir, n.cores=1, isPairedBED=TRUE, sampleNames=c("sample1-rep1", "sample1-rep2","sample2-rep1", "sample2-rep2"), replicateIDs=c(1,1,2,2)) #datasets 1-2 and 3-4 are replicates
For convenience, loadTSSobj() may also be called with argument sampleSheet="ssfile", where
ssfile is either a tab-delimited text file or an EXCEL spreadsheet (with extension .xls or .xlsx).
In either case, the first row must have the header
SAMPLE ReplicateID FILE and the same information described above in the respective columns.
If necessary, you can provide input to
loadTSSobj() consisting of both .bam and .bed files, but be aware that loading of the data into the tssObject will always add the .bam files before the .bed files and the internal numbering of the TSS sets will be accordingly.
If we wish to see our new tssObject, we simply type its name on the console and hit return, as follows:
Now that the .bam files have been imported, we need to retrieve TSSs from the BAM records and calculate the abundance of each tag at a given TSS.
We opt to not run these in parallel and specify this by setting
n.cores = 1.
tssObjectExample <- inputToTSS(experimentName=tssObjectExample) tssObjectExample <- processTSS(experimentName=tssObjectExample, n.cores=1, tssSet="all", writeTable=FALSE)
Now that this is complete we can we proceed with identifing TSRs from TSSs. We do this for each of the 4 datasets we imported at once by specifying tssSet="all".
tssObjectExample <- determineTSR(experimentName=tssObjectExample, n.cores=1, tssSetType="replicates", tssSet="all", tagCountThreshold=25, clustDist=20, writeTable=FALSE)
Next we merge our replicate data (according to the information we provided in
loadTSSobj) and identify TSRs on these merged samples.
We then use
addTagCountsToTSR to quantify the number of tags found in each of our 4 datasets.
tssObjectExample <- mergeSampleData(experimentName=tssObjectExample, n.cores=1, tagCountThreshold=1) tssObjectExample <- determineTSR(experimentName=tssObjectExample, n.cores=1, tssSetType="merged", tssSet="all", tagCountThreshold=25, clustDist=20, writeTable=FALSE) tssObjectExample <- addTagCountsToTSR(experimentName=tssObjectExample, tsrSetType="merged", tsrSet=1, tagCountThreshold=25, writeTable=FALSE)
Now that we have identified TSRs using
determineTSR from all replicate and merged datasets, we can select individual results directly.
To do this, we use
getTSRdata, one of the
tssObject accessor methods.
sample_1_1_tsrs <- getTSRdata(experimentName=tssObjectExample, slotType="replicates", slot=1) print(sample_1_1_tsrs)
We see that two distinct TSRs were identified from this small example dataset.
Finally, we import an annotation file (containing Gencode annotated transcripts) and then assocate this annotation with our small set of identified TSRs.
gff3data.dir <- system.file("extdata", package="TSRchitect") tssObjectExample <- importAnnotationExternal(experimentName=tssObjectExample, fileType="gff3", annotFile=paste(gff3data.dir,"gencode.v19.chr22.transcript.gff3",sep="/")) tssObjectExample <- addAnnotationToTSR(experimentName=tssObjectExample, tsrSetType="merged", tsrSet=1, upstreamDist=2000, downstreamDist=500, feature="transcript", featureColumnID="ID", writeTable=FALSE)
Before we end, we choose to save our tssObject in an .RData file to return to at a later time:
This ends our vignette. Please see the TSRchitect User's Guide for more extensively documented examples.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.