knitr::opts_chunk$set( tidy=FALSE, cache=FALSE, dev="png", collapse = TRUE, comment = "#>", eval = FALSE )
library(brentlabRnaSeqTools)
You can always access this documentation by placing a question mark in front of the function or data variable which is loaded by the brentlabRnaSeqTools
package. For example:
?postFastqSheet
database_info is a list object which becomes available when you attach the brentlabRnaSeqTools library. You can view a list of the slots by doing this:
?database_info
Alternatively, if you place database_info$
in your console and hit tab, a list of slots will appear. The same is true for values which themselves are lists. For example, if you enter database_info$kn99_urls$
and hit tab, a list of urls will pop up.
For functions which upload data to the database, you'll need to use the same username/password you use to log into the frontend data entry system. Typically, this will look something like this:
username: I.SURNAME
password: password123
The database uses this username/password to generate a "token", which is what is actually used to sign in a user to the database system.
So, in order to interact with the database, you sometimes need your "authorization token". This is what getUserAuthToken()
does:
# check the documentation ?getUserAuthToken
A valid call to this function looks like this:
username = 'I.SURNAME' password = 'password123' my_token = getUserAuthToken(database_info$kn99_urls$token_auth, username, password) # view your token print(my_token)
In general, you want to keep your username, password and token secret. One way to do that in R is to use your .Renviron file, and to ensure that it is in your .gitignore. Note: this assumes that you are working in a project directory
To make a .Renviron directory in your project, do this:
usethis::edit_r_environ("project")
You want to be sure at this point to add the .Renviron
file to your .gitnore
. Do this by just adding .Renviron
to a new line in the .gitignore
file.
Next, open the .Renviron
file and add some environmental variables. For example, you might do this:
# note: the db_ variables will be used later in this vignette db_username = 'db_username' db_password = 'db_password123' username = 'I.SURNAME' password = 'password123' token = 'lalskdfjaslkdf12341klajsdf' # output of getUserAuthToken()
Restart your R session (see the Session menu in the Rstudio window), and you can now access your environmental variables. Using the getUserAuthToken as an example, you would now do this:
username = Sys.getenv('username') password = Sys.getenv('password') my_token = getUserAuthToken(database_info$kn99_urls$token_auth, username, password) # view your token print(my_token)
From now on, it is assumed that you have a .Renviron file in your project directory with the variables listed above.
Check the documentation with ?postFastqSheet
. Using this function looks like this:
# NOTE: make sure you choose the right organism for your fastq file database_url = database_info$kn99_urls$FastqFiles auth_token = Sys.getenv('token') # note: currently, this function accepts files in .csv, .tsv and .xlsx formats new_fastq_path = '/path/to/new/fastq.xlsx' # save the output in a variable. If there is a failure, this is where the error information will be new_fastq_response = postFastqSheet(database_url, auth_token, new_fastq_path)
If the response is success
or code 201
or 201
, then the communcation with the database was successful. If it was not, then you'll get a failure
or code 400
. In that case, save the reponse variable like so:
# note, the name might be something like "fastq_response_20210701.rds" write_rds(new_fastq_response, "database_log/unique_name.rds")
And send it to whoever can use this this to figure out what went wrong.
First, mount the cluster to your local computer so that you have access to the count file generated by the cluster based QC pipeline. Once you have done that, sending the counts file is similar to sending a fastq sheet:
# NOTE: make sure you choose the right organism for your counts file database_counts_url = database_info$kn99_urls$Counts run_number = 12345 auth_token = Sys.getenv("token") new_counts_path = "/path/to/counts/file" # See section on `archiveDatabase` fastq_df = read_csv("data/20210701/fastq.csv") new_counts_response = postCounts(database_counts_url, run_number, auth_token, new_counts_path, fastq_df)
To pull the current state of the database on your computer, use archiveDatabase()
. Note: you do not need to keep historic copies locally, so clean this regularly by deleting old archives.
Note: this uses a different username and password than the ones we have been using above. This uses the "superuser" credentials. It is assumed that these have been stored in your .Renviron already.
database_host = database_info$kn99_host database_name = database_info$kn99_db_name database_user = Sys.getenv("db_username") database_password = Sys.getenv("db_password") # this assumes you have a data directory in your current working directory output_dir = "data" archiveDatabase(database_host, database_name, database_user, database_password, output_dir)
The output of this function will be a directory named by today's date in the output_dir
. Inside will be a separate .csv for each table in the database, as well as a combined table. This is what we use for the fastq_df
in the postCounts
function above.
To create a query sheet to run the pipeline (this used to be what the function queryDB
on the cluster did), will use the function getMetadata
. Note: if you are creating this to run a new run through the pipeline, then this needs to happen after you have added the fastq sheet to the database, and ensured that the new run is in the lts_sequence
directory in the appropriate format.
# as in the archiveDatabase example, this is the "super user" credentials, not your personal credentials or token. It is assumed that this is in your .Renviron file. database_host = database_info$kn99_host database_name = database_info$kn99_db_name database_user = Sys.getenv("db_username") database_password = Sys.getenv("db_password") combined_df = getMetadata(database_host, database_name, database_user, database_password)
You will next need to filter down to the set of samples you are interested in. Please ask for help with this if you need it. Here is an example which returns only the samples in a given run:
subset = combined_df %>% filter(runNumber == 12345)
Assuming you have the cluster mounted to your local system, you can now save this in your personal scratch rnaseq_pipeline/query
directory:
write_csv(subset, "/path/to/mounted_scratch/rnaseq_pipeline/query/run_12345.csv")
Now you would run the pipeline on the cluster as before. Instructions may be found here:
https://github.com/BrentLab/rnaseq_pipeline/wiki
Getting the raw counts is very similar to getMetadata
# as in the archiveDatabase example, this is the "super user" credentials, not your personal credentials or token. It is assumed that this is in your .Renviron file. database_host = database_info$kn99_host database_name = database_info$kn99_db_name database_user = Sys.getenv("db_username") database_password = Sys.getenv("db_password") combined_df = getRawCounts(database_host, database_name, database_user, database_password)
See browseVignettes("brentlabRnaSeqTools")
experiment_sets
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.