title: "SB-CGC-Overview-RNASeq-WF-WS2.rmd" author: "Durga Addepalli" date: "May 02, 2016" output: html_document
This guide will get you started to use Seven Bridges API with the R client package sbgr, and guide you through the steps needed to run the "RNA-seq Alignment - STAR" and RNA-Seq alignment pipeline on the Seven Bridges Genomics platform.
The following primary steps will be covered: - Getting access to SB CGC - Setting up and sharing projects - File management - running the tools and workflows - downloading the results.
download.file("https://github.com/teamcgc/teamcgc.github.io/....SBCGC-R-CreateWorkflow.Rmd", destfile = "~/bioc-workflow.Rmd")
Download and install the sevenbridges package from Bioconductor
source("http://bioconductor.org/biocLite.R") useDevel(devel = TRUE) biocLite("sevenbridges")
Alternatively, If you cannot install it because you are not running latest R, you can install the package from github:
## Install from github for development version # install.packages("devtools") if devtools was not installed source("http://bioconductor.org/biocLite.R") # install.packages("devtools") if devtools was not installed library(devtools) install_github("sbg/sevenbridges-r", build_vignettes=TRUE, repos=BiocInstaller::biocinstallRepos(), dependencies=TRUE)
browseVignettes(package = 'sevenbridges')
You can find login/registration on NCI Cancer Genomics Cloud homepage, follow the signup tutorial if you have an ERA Commons or NIH account.
After you login, you can get your authentication under your account setting and 'developer' tab tutorial
Now lets use the 'sevenbridges' packages you just installed to create a project.
Almost everything starts from this object.
On the SB platform/CGC GUI, Auth is your account, and it contains - projects, - billing groups, - users / collaborations - project contains tasks, apps, files etc.
To create an Auth object, run the command below and replace "fake_token" with your own token.
library(sevenbridges) a <- Auth(token = "fake_token", url = "https://cgc-api.sbgenomics.com/v2/")
To create a project, you need to know your billing group id, cost related to this project will be charged from this billing group. - FOR THIS workshop – free credits
(b <- a$billing()) a$billing(id = b$id, breakdown = TRUE)
This call returns information about your account.
a$user()
Now let's create a new project called "IRP-SBCGC-WS2", save it to a 'p' object for convenient usage for any call related to this project.
(p <- a$project_new("IRP-SBCGC-WS2", b$id, description = "This project is for IRP-CGC R-Workshop@CBIIT"))
Now check it on CGC, you will see a fresh new project is created.
This call returns a list of all projects you are a member of. Each project’s project_id and URL on the CGC will be returned.
a$project()
To list the projects owned by and accessible to a particular user
a$project(owner = "Durga")
To get details about project(s), use detail = TRUE
a$project(detail = TRUE) a$project(owner ="sdavis2", detail = TRUE)
SB CGC also supports partial name match in this interface. The first argument for the call is “name”, users can provide part of the name.
a$project("data")
Only single project could be deleted by call $delete(), so please pay attention to the returned object from a$project(), sometimes if you are using partial matching by name, it will return a list.
To delete it, just call, but I will suggest you keep it for following tutorial : )
a$project("IRP-SBCGC-WS2")$delete() ## will delete all projects matching the name delete(a$project("IRP-SBCGC-WS2_donnot_delete_me"))
For each file, the call returns: Its ID, Its filename. The project is specified as a query parameter in the call.
p$file()
To list all files belonging to a project (irp-cgc-with-r):
n <- a$project(id = "Durga/irp-cgc-with-r") n$file() n$file("bam", project = n, detail = TRUE)
Let’s try to copy a file from DemoData
You must provide id file id, or list/vector of files ids. project parameter: project id.
To copy a single file
pid <- a$project("IRP-SBCGC-WS2")$id fid <- "56f4592ae4b070c30853bbcb" a$project("DemoData")$file(id = fid)$copyTo(pid)
To Copy multiple files
fid <- "5788524ce4b035347bd5c54c" fid2 <- "57884b7fe4b01d432ccb8d97" fid3 <- "57885ecce4b035347bd5c553" a$copyFile(c(fid, fid2, fid3), project = pid) p$file()
## Delete file a$project("IRP-SBCGC-WS2")$file()[[1]]$delete() ## confirm the deletion a$project("IRP-SBCGC-WS2")$file() ## Search for specific file by pattern match and delete, this can delete multiple files too. a$project("IRP-SBCGC-WS2")$file("bam")$delete() ## You can also delete a group of files or FilesList object, be careful with this function! a$project("IRP-SBCGC-WS2")$file("bam") ## delete all of them delete(a$project("IRP-SBCGC-WS2")$file("fastq")) a$project("IRP-SBCGC-WS2")$file("fastq")
a$project("IRP-SBCGC-WS2")$file(id = "56f4592ae4b070c30853bbcb")$download_url() ## To download directly from R, use download call directly from single File object. fid <- a$project("IRP-SBCGC-WS2")$file(id ="56f4592ae4b070c30853bbcb")$id a$project("IRP-SBCGC-WS2")$file(id = fid)$download("~/Downloads/") ##To download all files from a project. a$project("IRP-SBCGC-WS2")$download("~/Downloads") ## Use download function for FilesList object to save your time fls <- a$project("IRP-SBCGC-WS2")$file() download(fls, "~/Downloads/")
Seven Bridges platforms provide couple different ways for data import • command line uploader • graphic UI uploader • from ftp, http etc from interface directly • api uploader that you can directly call with sevenbridges package API client uploader is working like this, simply call project$upload function to upload a file a file list or a folder recursively…
fl <- system.file("extdata", "sample1.fastq", package = "sevenbridges") ## by default load .meta for the file p$upload(fl, overwrite = TRUE) ## pass metadata p$upload(fl, overwrite = TRUE, metadata = list(library_id = "testid2", platform = "Illumina x11")) ## rename p$upload(fl, overwrite = TRUE, name = "sample_new_name.fastq", metadata = list(library_id = "new_id"))
You can list all Apps available to you.
a$app() ## or show details a$app(detail = TRUE)
You can search by name, partial name or ID
## pattern match a$app(name = "get_http") ## unique id a$app(id = "tsang/temp/txfr") ## get a specific revision from an app a$app(id = aid, revision = 0)
To list all apps belonging to one project use project argument
a$project("irp-cgc-with-r")$app() ## or alternatviely a$app(project = pid)
To run the example RNA-Seq alignment STAR workflow, we need two apps in our project: get-http_file RNA-Seq Alignment STAR
This call copies the specified app to the specified project. The app should be from a project that you can access; this could be an app that has been uploaded to the CGC by a project member, or a publicly available app that has been copied to the project.
Need two arguments project: id character name: optional, to re-name your app
Copy app "get-http_file"
app.gets3fl <- a$copyApp(id = "Durga/irp-cgc-with-r/Get-http-file-WS2", project = pid, name = "get-http-f-WS2") apid <- app.gets3fl$id ## check it is copied a$app(project = pid)
a$app("STAR", visibility = "public", complete = TRUE) app.rnawf <- a$copyApp(id = "djordje_klisic/public-apps-by-seven-bridges/rna-seq-alignment-star/3", project = pid, name = "STAR-WF-WS2") rwid <- app.rnawf$id ## check it is copied a$app(project = pid)
Here we demonstrate how to upload files directly from a S3 bucket to your CGC project. We will be using the Get_Http_file app to get files. ......
To create a draft task, you need to call the task_add function from Project object. And you need to pass following arguments
p$task_add(name = "getHttpF-WS2", description = "Using get_http_file app to get files from S3 bucket", app=apid, inputs = list(url = "http://teamcgc.nci.nih.gov.s3-website-us-east-1.amazonaws.com/20160325SBGworkshop/FastqFiles/G28029_pe_1.fastq", ofname = "G28029_pe_1.fastq")) ## confirm p$task(status = "draft")
This call runs (executes) the specified task. Only tasks whose status is “DRAFT” may be run.
tsk <- p$task(status = "draft") tsk$run() ## run update without information just return latest information tsk$update()
Now we will run the RNA-seq Alignment STAR workflow (on which we worked in the last workshop) taking the BAM file uploaded from the previous task, as an input.
WorkFlow Description:
What tools are inbuilt in this workflow?
What Input, Output and Parameter you want to run the workflow
Input
Genome Fasta Files
Output
fastq1 <- p$file(name = "G28029_pe_1.fastq") fastq2 <- p$file(name = "G28029_pe_2.fastq") ## Set metadata / Pairend info fastq1$setMeta(Paired_end = "1") fastq2$setMeta(Paired_end = "2") (fastq.in <- p$file(name = "G28029")) (gtf.in <- p$file(".gtf")) (gtf.in1 = new('FilesList',listData=list(gtf.in))) (fasta.in <- p$file("HG19_Broad_variant.fasta")) tsk = p$task_add(name = "RNA-WF_WS2", description = "Testing the RNA-seq STAR workflow", app = rwid, inputs = list(fastq = fastq.in, genomeFastaFiles = fasta.in, sjdbGTFfile = list(gtf.in1))) ## confirm p$task(status = "draft") ## Run the Task tsk1 <- p$task(status = "draft") tsk1$run() tsk1$update() tsk1$monitor()
## Copy the app app.rnawfbam <- a$copyApp(id = "Durga/irp-cgc-with-r/rna-seq-alignment-star-baseline-20160217", project = pid, name = "star-wfl-inbam") rwbid <- app.rnawfbam$id ## check if it is copied a$app(project = pid) ## Run the app (bam.in <- p$file("Galaxy14-data13downsampled.bam")) (gtf.in <- p$file(".gtf")) (fasta.in <- p$file("HG19_Broad_variant.fasta")) tsk3 = p$task_add(name = "rwfl-inbamtest2-R", description = "Testing the RNA-seq STAR workflow with input BAM file", app = rwbid, inputs = list(input_file = bam.in, genomeFastaFiles = fasta.in, sjdbGTFfile = sevenbridges:::FilesList(gtf.in))) ## confirm p$task(status = "draft") ## Run the Task tsk3 <- p$task(status = "draft") tsk3$run() tsk3$update() tsk3$monitor()
## Copy the app ## another app for batch app.rnawfbamb <- a$copyApp(id = "Durga/irp-cgc-with-r/rna-seq-alignment-star-baseline-20160217", project = pid, name = "star-wf-inbam-batch-UIcopy") rwbbid <- app.rnawfbamb$id ## check if it is copied a$app(project = pid) ## Run the app (bam.in <- p$file("downsampled.bam")) (gtf.in <- p$file(".gtf")) (fasta.in <- p$file("HG19_Broad_variant.fasta")) tsk3 = p$task_add(name = "rwf-inbam_batch-R", description = "Testing the RNA-seq STAR workflow with multiple input BAM files", app = rwbbid, batch = batch(input = "input_file"), inputs = list(input_file = bam.in, genomeFastaFiles = fasta.in, sjdbGTFfile = sevenbridges:::FilesList(gtf.in))) ## confirm p$task(status = "draft") ## Run the Task tsk3 <- p$task(status = "draft") tsk3$run() tsk3$update() tsk3$monitor()
a$project("IRP-SBCGC-WS2")$download("~/Downloads") ## To Download specific files use their Ids fid <- a$project("IRP-SBCGC-WS2")$file(id ="56f4592ae4b070c30853bbcb")$id a$project("IRP-SBCGC-WS2")$file(id = fid)$download("~/Downloads/")
After you install the package
browseVignettes("sevenbridges")
or on Bioconductor devel branch sevenbridges page
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.