run_salmon: Quantify transcript abundances using Salmon
In anilchalisey/rseqR: RNA-seq data processing pipeline

Description Usage Arguments Value References

Run the abundance quantification tool Salmon on a set of FASTQ files. Requires Salmon (https://combine-lab.github.io/salmon/) to be installed and a Salmon transcript index must have been generated prior to using this function. See the Salmon website for installation and basic usage instructions.

run_salmon(fastq1, fastq2 = NULL, index.dir, dest.dir = "SALMON",
  salmon = "salmon", threads = NULL, advanced.opts = NULL,
  bam = FALSE, bootstraps = 0, seqBias = TRUE, gcBias = TRUE,
  posBias = FALSE, allowOrphans = FALSE)

`fastq1`	a character vector indicating the read files to be trimmed.
`fastq2`	(optional) a character vector indicating read files to be trimmmed. If specified, it is assumed the reads are paired, and this vector MUST be in the same order as those listed in `fastq1`. If `NULL` then it is assumed the reads are single-end.
`index.dir`	directory of the index files needed for read mapping using Salmon. See function `'build_index()'`.
`dest.dir`	directory where results are to be saved. If directory does not exist, then it will be created.
`salmon`	(optional) string giving full command to use to call Salmon, if simply typing "salmon" at the command line does not give the required version of Salmon or does not work. Default is simply "salmon". If used, this argument should give the full path to the desired Salmon binary.
`threads`	an integer value indicating the number of parallel threads to be used by FastQC. [DEFAULT = maximum number of available threads - 1].
`advanced.opts`	character vector supplying list of advanced option arguments to apply to each Salmon call. For details see Salmon documentation or type `salmon quant --help-reads` at the command line.
`bam`	logical, if `TRUE` then create a pseudo-alignment BAM file. [Default = `FALSE`]
`bootstraps`	integer giving the number of bootstrap samples that Salmon should use (default is 0). With bootstrap samples, uncertainty in abundance can be quantified.
`seqBias`	logical, should Salmon's option be used to model and correct abundances for sequence specific bias? Default is `TRUE`.
`gcBias`	logical, should Salmon's option be used to model and correct abundances for GC content bias? Requires Salmon version 0.7.2 or higher. Default is `TRUE`.
`posBias`	logical, should Salmon's option be used to model and correct abundances for positional biases? Requires Salmon version 0.7.3 or higher. Default is `FALSE`.
`allowOrphans`	logical, if `TRUE` then consider orphaned reads as valid hits when performing lightweight-alignment. This option will increase sensitivity (allow more reads to map and more transcripts to be detected), but may decrease specificity as orphaned alignments are more likely to be spurious. For more details see Salmon documentation.

The following items will be returned and saved in the salmon directory:

quant.sf: plain-text, tab-separated quantification file that contains 5 column: Name,Length,EffectiveLength,TPM, and NumReads.
quant.sf.bkp: plain-text, tab-separated quantification file that contains 5 column: Name,Length,EffectiveLength,TPM, and NumReads. This is the raw version of the quant.sf file.
cmd_info.json: A JSON format file that records the main command line parameters with which Salmon was invoked for the run that produced the output in this directory.
aux_info: This directory will have a number of files (and subfolders) depending on how salmon was invoked.
meta_info.json: A JSON file that contains meta information about the run, including stats such as the number of observed and mapped fragments, details of the bias modeling etc.
ambig_info.tsv: This file contains information about the number of uniquely-mapping reads as well as the total number of ambiguously-mapping reads for each transcript.
lib_format_counts.json: This JSON file reports the number of fragments that had at least one mapping compatible with the designated library format, as well as the number that didn't.
libParams: The auxiliary directory will contain a text file called flenDist.txt. This file contains an approximation of the observed fragment length distribution.

Rob Patro, Geet Duggal, Michael I. Love, Rafael A. Irizarry, and Carl Kingsford (2017): Salmon provides fast and bias-aware quantification of transcript expression. Nature methods, 14(4), 417. https://www.nature.com/articles/nmeth.4197

anilchalisey/rseqR documentation built on May 25, 2019, 2:25 p.m.