In cmatKhan/brentlabRnaSeqTools: A Collection of Functions to Interact with the Brentlab RNASeq Database

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  eval = FALSE
)

library(brentlabRnaSeqTools)
library(tidyverse) # only need dplyr and readr

Get the Metadata

See the create_set vignette/article for comments on the DB_USERNAME and DB_PASSWORD lines below

metadata_df = getMetadata(database_info$kn99_host, 
                          database_info$kn99_db_name, 
                          Sys.getenv("DB_USERNAME"), 
                          Sys.getenv("DB_PASSWORD"))

Filter

There are some experiment sets which have functions included to subset the metadata, for example createNinetyMinuteInductionSet(). But, in general, expect to have to filter the metadata yourself. This is a good resource dplyr as is R for datascience

subset_metadata = metadata_df %>% 

  filter(experimentDesign=="Timecourse", genotype1=="CNAG_00000", timePoint %in% c(0, 1440))

Note in the filter above that a better method for filtering for "wildtype" strains is by the strain number. There are clinical strains in the database which also have genotype1 == "CNAG_00000".

Creating the nextflow sample sheet

I do this from my local computer, so if I want to check if the files exist, I mount the /lts directory and then change the sequence_dir_prefix below to the path to lts_sequence from my local mount point. I then change the sequence_dir_prefix to the correct one for the cluster and print that sample_sheet out for processing.

sequence_dir_prefix = "/scratch/mblab/$USER/scratch_sequence/fastq_set"

sample_sheet = createNfCorePipelineSampleSheet(
  subset_metadata,
  "fastqFileNumber",
  sequence_dir_prefix,
  check_files_flag = FALSE
)

Move the files and check that they exist on the cluster

(repeated in "running the pipeline")

This is a frustrating step, and one that I haven't re-written a function for yet in the brentlabRnaSeqTools package. Right now, this is how I do it. If you are not a computational person in the lab, please just ask one of us to move the files for you

awk -F"," '{print $2}' sample_sheet.csv > fastq_lookup.txt

mkdir scratch_sequence/fastq_set

cat fastq_lookup.txt | while read line; do rsync -aHvR $line scratch_sequence/fastq_set/; done

# and then I move the run directories back up to the level that I want them -- please just ask if this doesn't make sense. Eventually there will be a function in the brentlabRnaSeqTools. If you have a better bash method, please tell me.

Inspect and write out the sample sheet

output_path = "/path/to/sample_sheet.csv"

write_csv(sample_sheet, output_path)

Running the nextflow pipeline

See the article/vignette "Running the nextflow pipeline"

cmatKhan/brentlabRnaSeqTools documentation built on Nov. 17, 2021, 5:47 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

cmatKhan/brentlabRnaSeqTools
A Collection of Functions to Interact with the Brentlab RNASeq Database

In cmatKhan/brentlabRnaSeqTools: A Collection of Functions to Interact with the Brentlab RNASeq Database

Get the Metadata

Filter

Creating the nextflow sample sheet

Move the files and check that they exist on the cluster

Running the nextflow pipeline

R Package Documentation

Browse R Packages

We want your feedback!

cmatKhan/brentlabRnaSeqTools A Collection of Functions to Interact with the Brentlab RNASeq Database

In cmatKhan/brentlabRnaSeqTools: A Collection of Functions to Interact with the Brentlab RNASeq Database

Get the Metadata

Filter

Creating the nextflow sample sheet

Move the files and check that they exist on the cluster

Running the nextflow pipeline

R Package Documentation

Browse R Packages

We want your feedback!

cmatKhan/brentlabRnaSeqTools
A Collection of Functions to Interact with the Brentlab RNASeq Database