run_salmon: Quantify transcript abundances using Salmon

Description Usage Arguments Value References

Description

Run the abundance quantification tool Salmon on a set of FASTQ files. Requires Salmon (https://combine-lab.github.io/salmon/) to be installed and a Salmon transcript index must have been generated prior to using this function. See the Salmon website for installation and basic usage instructions.

Usage

1
2
3
4
run_salmon(fastq1, fastq2 = NULL, index.dir, dest.dir = "SALMON",
  salmon = "salmon", threads = NULL, advanced.opts = NULL,
  bam = FALSE, bootstraps = 0, seqBias = TRUE, gcBias = TRUE,
  posBias = FALSE, allowOrphans = FALSE)

Arguments

fastq1

a character vector indicating the read files to be trimmed.

fastq2

(optional) a character vector indicating read files to be trimmmed. If specified, it is assumed the reads are paired, and this vector MUST be in the same order as those listed in fastq1. If NULL then it is assumed the reads are single-end.

index.dir

directory of the index files needed for read mapping using Salmon. See function 'build_index()'.

dest.dir

directory where results are to be saved. If directory does not exist, then it will be created.

salmon

(optional) string giving full command to use to call Salmon, if simply typing "salmon" at the command line does not give the required version of Salmon or does not work. Default is simply "salmon". If used, this argument should give the full path to the desired Salmon binary.

threads

an integer value indicating the number of parallel threads to be used by FastQC. [DEFAULT = maximum number of available threads - 1].

advanced.opts

character vector supplying list of advanced option arguments to apply to each Salmon call. For details see Salmon documentation or type salmon quant --help-reads at the command line.

bam

logical, if TRUE then create a pseudo-alignment BAM file. [Default = FALSE]

bootstraps

integer giving the number of bootstrap samples that Salmon should use (default is 0). With bootstrap samples, uncertainty in abundance can be quantified.

seqBias

logical, should Salmon's option be used to model and correct abundances for sequence specific bias? Default is TRUE.

gcBias

logical, should Salmon's option be used to model and correct abundances for GC content bias? Requires Salmon version 0.7.2 or higher. Default is TRUE.

posBias

logical, should Salmon's option be used to model and correct abundances for positional biases? Requires Salmon version 0.7.3 or higher. Default is FALSE.

allowOrphans

logical, if TRUE then consider orphaned reads as valid hits when performing lightweight-alignment. This option will increase sensitivity (allow more reads to map and more transcripts to be detected), but may decrease specificity as orphaned alignments are more likely to be spurious. For more details see Salmon documentation.

Value

The following items will be returned and saved in the salmon directory:

  1. quant.sf: plain-text, tab-separated quantification file that contains 5 column: Name,Length,EffectiveLength,TPM, and NumReads.

  2. quant.sf.bkp: plain-text, tab-separated quantification file that contains 5 column: Name,Length,EffectiveLength,TPM, and NumReads. This is the raw version of the quant.sf file.

  3. cmd_info.json: A JSON format file that records the main command line parameters with which Salmon was invoked for the run that produced the output in this directory.

  4. aux_info: This directory will have a number of files (and subfolders) depending on how salmon was invoked.

  5. meta_info.json: A JSON file that contains meta information about the run, including stats such as the number of observed and mapped fragments, details of the bias modeling etc.

  6. ambig_info.tsv: This file contains information about the number of uniquely-mapping reads as well as the total number of ambiguously-mapping reads for each transcript.

  7. lib_format_counts.json: This JSON file reports the number of fragments that had at least one mapping compatible with the designated library format, as well as the number that didn't.

  8. libParams: The auxiliary directory will contain a text file called flenDist.txt. This file contains an approximation of the observed fragment length distribution.

References

Rob Patro, Geet Duggal, Michael I. Love, Rafael A. Irizarry, and Carl Kingsford (2017): Salmon provides fast and bias-aware quantification of transcript expression. Nature methods, 14(4), 417. https://www.nature.com/articles/nmeth.4197


anilchalisey/rseqR documentation built on May 25, 2019, 2:25 p.m.