knitr::opts_chunk$set( collapse = TRUE, comment = "#>", eval = FALSE )
This is a bit of a misnomer -- nextflow is simply the language. The pipeline is from nf-co.re, which is a highly curated repository for nextflow pipelines. There is only one pipeline for any given task, meaning there will not be 10 different rnaseq_pipelines. Rather, all development for rnaseq is happening in one spot.
There are two options -- you can read the parameter docs to figure out how to input your data, or you can click the "launch version #.#" button in the header of page. There is a form you can fill out which will create the parameters.json document, and also tell you what the command will look like.
Here is an example of what submitting to the pipeline looks like
sample fastq_1 fastq_2 strandedness 269 /lts/mblab/Crypto/rnaseq_data/lts_sequence/run_4178_samples/Brent_J... "" reverse 270 /lts/mblab/Crypto/rnaseq_data/lts_sequence/run_4040_samples/Brent_s... "" reverse 278 /lts/mblab/Crypto/rnaseq_data/lts_sequence/run_4040_samples/Brent_s... "" reverse 306 /lts/mblab/Crypto/rnaseq_data/lts_sequence/run_1658_samples/1658_Br... "" reverse 307 /lts/mblab/Crypto/rnaseq_data/lts_sequence/run_1658_samples/1658_Br... "" reverse 308 /lts/mblab/Crypto/rnaseq_data/lts_sequence/run_1658_samples/1658_Br... "" reverse
For those computational people, what you need to do is move the files from lts to scratch. Use your favorite method.
If you wish to use my methods, here is how I do it:
In the brentlabRnaSeqTools there is a function called moveNfCoFastqFiles()
which creates a two column dataframe of structure column1: source, column2: destination. I write that to the cluster and then use the bash script below to move the files. Here is my full process, using crypto as an example:
prefix = "/lts/mblab/Crypto/rnaseq_data/lts_sequence" lookup = moveNfCoFastqFiles(sample_sheet, prefix) # from my local with the cluster mounted write_tsv(lookup, "/scratch/mblab/chasem/rnaseq_pipeline/lookups/my_lookup.tsv")
I then go over to the cluster and use a script called moveFiles.sh. It contains:
#!/bin/bash nrow=$(wc -l $1 | grep -oP "^[[:digit:]]+") for i in $(seq 1 $nrow); do read s d < <(sed -n ${i}p $1) dest_dir=$(dirname $d) mkdir -p $dest_dir echo "copying $1 of $nrow" rsync -aHv $s $d done
So from the command line I do this:
./job_scripts/moveFiles.sh lookups/my_lookup.tsv
NOTE! This is not necessarily a "standard" parameter file -- in particular, the feature_group_type is " and we are skipping the biotype_qc. This is necessary for crypto since there is no biotype information in the GTF. This should not be the case, however, certainly will not be true if you are processing human and mouse data.
{ "input": "\/scratch\/mblab\/chasem\/rnaseq_pipeline\/daniel_set\/daniel_sample.csv", "outdir": ".\/daniel_set_results", "gtf": "/scratch\/mblab/chasem\/rnaseq_pipeline\/genome_files_curr\/KN99\/liftoff_h99_to_kn99.gtf", "fasta": "\/scratch\/mblab/chasem\/rnaseq_pipeline\/genome_files_curr\/KN99\/KN99_genome_fungidb.fasta", "gtf_extra_attributes": "ID", "featurecounts_group_type": "", "rseqc_modules": "bam_stat,inner_distance,infer_experiment,junction_annotation,junction_saturation,read_distribution,read_duplication", "skip_bigwig": true, "skip_biotype_qc": true }
#!/bin/bash #SBATCH --time=15:00:00 # right now, 15 hours. change depending on time expectation to run #SBATCH --mem-per-cpu=5G #SBATCH -J your_run_name #SBATCH -o ./your_output_name.out ml miniconda ml singularity source activate /path/to/your/conda_env/nextflow work_folder_name=work_today config_path=/scratch/mblab/$USER/configs/conf/wustl_htcf.config param_file=/scratch/mblab/$USER/param.json # yes -- keep this here mkdir -p tmp # note! -r 3.3 is the 'revision' or version of the nf-co/rnaseq pipeline you wish to use. # You should use the most up to date version, unless you are holding the version constant # for consistency across a set nextflow run nf-core/rnaseq -r 3.3 \ -work-dir ${PWD}/${work_folder_name} \ -c ${config_path} \ -params-file ${param_file}
Oh no! your pipeline errored half way through. Do the following:
When you think you ahve resolved the issue, just add the -resume
flag to your command like so and resubmit:
#!/bin/bash #SBATCH --time=15:00:00 # right now, 15 hours. change depending on time expectation to run #SBATCH --mem-per-cpu=5G #SBATCH -J your_run_name #SBATCH -o ./your_output_name.out ml miniconda ml singularity source activate /path/to/your/conda_env/nextflow work_folder_name=work_today config_path=/scratch/mblab/$USER/configs/conf/wustl_htcf.config param_file=/scratch/mblab/$USER/param.json # yes -- keep this here mkdir -p tmp # note! -r 3.3 is the 'revision' or version of the nf-co/rnaseq pipeline you wish to use. # You should use the most up to date version, unless you are holding the version constant # for consistency across a set nextflow run nf-core/rnaseq -r 3.3 \ -work-dir ${PWD}/${work_folder_name} \ -c ${config_path} \ -params-file ${param_file} \ -resume # see how easy that is!
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.