This tutorial describes steps involved in running STAR RNA-seq alignment on the FireCloud, include the following sections -
Please make sure you complete the following first.
Install the following:
There are several different ways of creating a docker image.
I used a Dockerfile, which is essentially a list of instructions to tell Docker how to build the image.
The STAR image is built and stored in the dockerhub - https://hub.docker.com/r/stevetsa/staralign/
Skip to the next section if you are just using the image, not creating a new one.
# This is the Dockerfile FROM ubuntu:latest RUN rm /bin/sh && ln -s /bin/bash /bin/sh MAINTAINER Steve Tsang <mylagimail2004@yahoo.com> RUN apt-get update RUN apt-get install --yes \ build-essential \ gcc-multilib \ apt-utils \ zlib1g-dev RUN apt-get install -y git # Get latest STAR source from offical website https://github.com/alexdobin/STAR RUN git clone https://github.com/alexdobin/STAR.git WORKDIR /STAR # Build STAR #RUN pwd RUN make STAR # To include STAR-Fusion RUN git submodule update --init --recursive # If you have a TeX environment, you may like to build the documentation # make manual ENV PATH /STAR/source:$PATH ```` Build the image using the Dockerfile. ```{sh, eval=FALSE} #building Docker image from Dockerfile docker build -t stevetsa/staralign:v2.52a . #push image to the dockerhub -a is author's name; -m is a commit message; and "4629cdb394d3" is the container ID docker commit -m "first commit" -a "Steve Tsa" 4629cdb394d3 stevetsa/staralign:v2.52a docker push stevetsa/staralign
STAR is a fast aligner but uses a lot of memory. This tutorial is designed to run downsampled RNAseq data aligning only to chromosome 21. If you plan to run full RNA-seq data against full human genome, resource allocation in the runtime block needs to be changed accordingly.
task staralign { #List of Input Files File RefHg19 File hg19GTF File read1 File read2 command <<< mkdir ./refgenome #### Generate reference genome stored in the refgenome directory STAR --runThreadN 8 --runMode genomeGenerate --genomeDir ./refgenome --genomeFastaFiles ${RefHg19} --sjdbGTFfile ${hg19GTF} #### Alignment and output BAM file with smG28029wdl2 prefix STAR --runThreadN 8 --genomeDir ./refgenome --readFilesIn ${read1} ${read2} --outFileNamePrefix smG28029wdl2 --outSAMtype BAM SortedByCoordinate >>> runtime { #### Docker image and resource allocation docker : "stevetsa/staralign:v2.52a" memory: "4G" cpu: "8" } output { File response_star = stdout() File outbam = "smG28029wdl2Aligned.sortedByCoord.out.bam" } } workflow star { File RefHg19 File hg19GTF File read1 File read2 call staralign { input: RefHg19 = RefHg19, hg19GTF = hg19GTF, read1 = read1, read2 = read2 } } ```` ### WDL creation and testing Create the JSON file. ```{sh, eval=FALSE} java -jar /Users/<path file>/wdltool-0.5.jar inputs StarAlign_cfg2.wdl > StarAlign_cfg2.json
Modify the json file to include the location/names of the input files.
{ "star.RefHg19": "chr21.fa", "star.hg19GTF": "chr21.hg19.gtf", "star.read1": "sm2G28029pe1.fq", "star.read2": "sm2G28029pe2.fq" }
Testing the WDL file locally
```{sh, eval=FALSE}
java -jar /Users/
If the run is successfully completed locally, you are ready to run this WDL file on the FireCloud. Use the following command to push the WDL file to the FireCloud method repo. More info [here](https://github.com/broadinstitute/firecloud-cli). ```{sh, eval=FALSE} docker run --rm -it -v "$HOME"/.config:/.config broadinstitute/firecloud-cli gcloud auth login docker run --rm -it -v "$HOME"/.config:/.config -v "$PWD":/working broadinstitute/firecloud-cli firecloud -m push StarAlign_cfg2.wdl -t Workflow -s <your_FireCloud_email_account> -y "STARv2.52a RNAseq Alignment"
For detailed description of each step, please refer to FireCloud intro
Log in to FireCloud in a private browser such as Chrom Incognito Window <https://portal.firecloud.org> Create a new workspace called "StarAlignTest" Under the Summary tab, click on the Google Bucket Download and unzip [Tutorial Files](https://s3.amazonaws.com/teamcgc.nci.nih.gov/TutorialFiles/FireCloudRNAseq.zip) Upload the content of the folder, except tsvfiles.xlsx, to the Google bucket In FireCloud, Click on "Summary" tab Add new "Workspace Attributes" RefHg19 -> gs://XXXXXXXXXXXXXXXXXXXXXXXX (copy-and-paste url from Google bucket) hg19GTF -> gs://XXXXXXXXXXXXXXXXXXXXXXXX (copy-and-paste url from Google bucket) Open tsvfiles.xlsx in any spreadsheet application, save the participant.tsv sheet as a tab-delimited text file called "participant.tsv" In the sample.tsv sheet, add the urls of the fastq files in the fastq1/fastq2 columns and save as a tab-delimited text file called "sample.tsv" In FireCloud, click on "Data" tab and click "Import Data" Import participant.tsv and sample.tsv into workspace Click on "Method Configurations" tab Click "Import Configuration" and click on "StarAlign_cfg2 Snapshot ID: 1" Change "Root Entity Type" to "Sample", click "Import" and a new page will appear Click "Edit this page" Make sure "Root Entity Type" is "Sample" Change "Inputs" star.hg19GTF -> workspace.chr21.hg19.gtf star.read1 -> this.fastq1 star.read2 -> this.fastq2 star.RefHg19 -> workspace.chr21.fa Change "Outputs"" star.staralign.outbam -> this.bam star.staralign.response_star -> this.response_star Click "Launch Analysis" ```` You will be able to monitor the progree under the "Monitor" tab. After the run is successfully completed, you will be able to see two additional columns in "Data" -> "Sample." The "bam" column contains the BAM file and the "response_star"" column contains STDOUT output. ## Visualize the alignment using IGV (optional) To visualize alignment, you need samtools (http://samtools.sourceforge.net/) and IGV (https://www.broadinstitute.org/igv/). Download the bam file to a local machine ```{sh, eval=FALSE} samtools index smG28029wdl2Aligned.sortedByCoord.out.bam
A smG28029wdl2Aligned.sortedByCoord.out.bam.bai file will be created and you should be able to load the alignment in IGV.
Remember, only chr21 will have aligned reads.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.