knitr::opts_chunk$set(echo = TRUE)
FireCloud, developed and maintained by the Broad Institute, is one of the three National Cancer Institute's Cancer Genomics Cloud Pilots. https://software.broadinstitute.org/firecloud/. This documment provides step-by-step description of creating a project on the FireCloud and upload RNA-seq (BAM) data. Before completing this tutorial, please set up an user account and a billing account with the instructions on the Firecloud's website.
https://software.broadinstitute.org/firecloud/guide/topic?name=firecloud-registration
- Go to <Portal.FireCloud.org> - Click "Register" - Sign In with a Gmail or Google Apps Account (e.g., FireCloudUser@gmail.com). When asked to allow FireCloud to view your email address and basic profile info, etc., please click Allow. An explanation of these requests is posted in the Help Forum. You will be prompted to enter New User Registration information. - Please enter a Contact Email if it differs from your Gmail or Google Apps Account. - Click Register (After registering, you will see a message indicating your FireCloud account is inactive. A FireCloud administrator will activate your account within 24 hours.)
https://software.broadinstitute.org/firecloud/guide/topic?name=firecloud-google
Here is some information on estimating costs in FireCloud from the development team.
FireCloud runs on the Google Cloud Platform (GCP). All FireCloud costs (cloud storage, compute, data egress etc.) are ultimately billed via Google Billing Accounts. In order to be compatible with multiple cloud environments, institutional payment systems, and security requirements, the FireCloud interface does not directly display any part of the Google Billing Account interface.
Instead, FireCloud will connect a Google Billing Account to an associated FireCloud Billing Project. When you are using FireCloud, it is these FireCloud Billing Projects that you will see in the interface, and to which FireCloud will charge your usage costs.
In order to create or clone a new workspace in FireCloud, you must have access to at least one FireCloud Billing Project. To get started with a FireCloud Billing Project or Google Billing Account, you can check out this [Help Topic] https://software.broadinstitute.org/firecloud/guide/topic?name=firecloud-google in the FireCloud User Guide.
Google Cloud Platform (GCP) Pricing Structure
Example - FireCloud testers ran a MuTect analysis on 1089 tumor/normal pairs - Compute: At $0.05/core hour and 23,305.4 core hours, the cost was $1,165.27 ($1.07 dollars/run) - Storage: There were 1089 output files with a total size of 0.001141 TB. At $26/TB/month, the storage/month was $0.03. Note: In this example, the pairs referred to BAMs that resided in the TCGA Open Access bucket, so no storage costs were incurred by the testers. - Download: To download these files, at $120/TB, the cost is $0.14. - Note: if the testers wanted to estimate costs, they could first run the analysis on a few pairs and calculate the core hours per run.
For more information, you can check out these links: - Google Compute Pricing - Google Storage Pricing - FireCloud Projects and Billing Accounts
https://software.broadinstitute.org/firecloud/guide/topic?name=firecloud-basics
https://portal.firecloud.org/
This tutorial is written using the Google Chrome as browser.
Click "Sign In" Sign in with your google credential (If you have multiple google accounts, log in in an incognito window by clicking File -> New Incognito Window)
After you log in, your profile is displayed.
If you have an NIH account, link your NIH account.
Click "Log-In to NIH to re-link your account" Log in with your NIH credential (You will need to renew/re-link your account periodically)
https://software.broadinstitute.org/firecloud/guide/topic?name=firecloud-basics The FireCloud world is organized into workspaces.
A workspace is a computational sandbox where you can organize genomic data and tools, and run analyses. Users can create, share, and clone workspaces. Workspaces hold:
Data: pre-loaded or user-uploaded, open access or controlled access
Workflows: pre-loaded or user-created
Tools: pre-loaded or user-created
Results: from all runs, captured with provenance
After you log in, your workspaces will be displayed. Workspaces that are TCGA access-controlled are labeled "Restricted."
To create a new workspace
Click on the blue botton on the upper-right-hand-corner "Create New Workspace... +" Make sure it is in the correct Google Project linked to the billing account (Refer to the Billing section above) Enter workspace name "IRPworkshop" Enter workspace description, if any. If this project will contain TCGA control data, check the box "Workspace intended to contain NIH protected data." If not, leave it unchecked. Click "Create Workspace"
Once a workspace is created, you can share with other users. Under "Summary" tab, under "Workspace Owner", you should see your gmail account listed. To share this workspace
Click "(Sharing...)" next to the gmail account A pop-up window will appear Click "Add new +" Enter user gmail address Set access level for the user (OWNER, WRITER, READER, NO ACCESS) Click "Save"
To upload file,
Click on the line under "Google Bucket" which opens the Google Cloud Storage browser for this workspace You will be taken to the Google Cloud Platform Click "Upload Files" Select all the .bam and .bai files Click "Open" (Multiple file tends to only upload 3 files simoutaneously, repeat this process as necessary until all bam/bai files are uploaded and can be seen in the Google Console)
Once the files are uploaded to the Google console, you need to add them to the workspace.
To do this, you need two TSV files that describes the bam/bai files.
The first one, called "participants.tsv" has information about the participants and disease type.
participant_id/sample_id fields are set arbitrarily and should be unique.
participant.tsv
entity:participant_id disease
G28029 BRCA
G41676 BRCA
G41659 LCLL
G41707 LCLL
The second one, called "sample.tsv" has information about the bam/bai files. sample.tsv has four columns, "sample_id", "BAM_index", "participant_id", and "BAM."
"sample_id" is arbitrarily set and should be unique.
"pariticpant_id" should match the participant_id in "participant.tsv"
"BAM_index"/"BAM" are addresses where the bam/bai files are in the Google Cloud Storage.
The address should have the format "gs://[Google_Bucket]/[file_name]"
If your Google Bucket location is "fc-974b342c-b6f2-45e7-b901-xxxxxxxxxxxx" and file name is "NGS.bam", your address in that column should be "gs://c-974b342c-b6f2-45e7-b901-xxxxxxxxxxxx/NGS.bam" For clarity, full file address is not listed below.
sample.tsv
entity:sample_id BAM_index participant_id BAM
G28029_sample bai_address1 G28029 bam_address1
G41676_sample bai_address2 G41676 bam_address2
G41659_sample bai_address3 G41659 bam_address3
G41707_sample bai_address4 G41707 bam_address4
Once your have the two TSV files created, go back to the workspace.
Click on "Data" tab Click "Import Data..." Click "Importa from file" Click "Choose file..." Select "participant.tsv" that you created above Click "Open" Click "Upload" and you should see a "Upload Successful" message Click "Import Data..." Click "Importa from file" Click "Choose file..." Select "sample.tsv" that you created above Click "Open" Click "Upload" and you should see a "Upload Successful" message
That's it. You have uploaded the files to the workspace.
When you click on the address for the bam files, you will see
Object: G28029_pe.Aligned.sortedByCoord.out.sorted.bam File size: 974.40 MBOpen (right-click to download)Warning: Downloading this file may incur a large data egress charge
Please note that uploading to the Firecloud is free, but you will be charged for downloading files.
More to come later on running analysis with those uploaded files.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.