import_salmon_quant: Import Salmon quant.sf files to SummarizedExperiment
In jmw86069/platjam: Platform Jam, biological platform importers.

import_salmon_quant

R Documentation

Import Salmon quant.sf files to SummarizedExperiment

Description

Import Salmon quant.sf files to SummarizedExperiment

Usage

import_salmon_quant(
  salmonOut_paths,
  import_types = c("tx", "gene", "gene_body", "gene_tx"),
  gtf = NULL,
  tx2gene = NULL,
  curation_txt = NULL,
  tx_colname = "transcript_id",
  gene_colname = "gene_name",
  gene_body_colname = "transcript_type",
  geneFeatureType = "exon",
  txFeatureType = "exon",
  countsFromAbundance = "lengthScaledTPM",
  gene_body_ids = NULL,
  trim_tx_from = NULL,
  trim_tx_to = NULL,
  verbose = FALSE,
  ...
)

Arguments

`salmonOut_paths`	`character` vectors to each individual folder that contains the `"quant.sf"` output file for Salmon.
`import_types`	`character` indicating which type or types of data to return. Note that the distinction between `gene` and `gene_body` is only relevant when there are transcript entries defined with `transcript_type="gene_body"`. These entries specifically represent unspliced transcribed regions for a gene locus, and only for multi-exon genes. `tx`: transcript quantitation, direct import of `quant.sf` files. `gene`: gene quantitation after calling `tximport::summarizeToGene()`, excluding `transcript_type="gene_body"`. `gene_body`: gene quantitation after calling `tximport::summarizeToGene()`, including `transcript_type="gene_body"`.
`gtf`	`character` path to a GTF file, used only when `tx2gene` is not supplied. When used, `splicejam::makeTx2geneFromGtf()` is called to create a `data.frame` object `tx2gene`.
`tx2gene`	`character` path to file, or `data.frame` with at least two columns matching `tx_colname` and `gene_colname` below. When supplied, the `gtf` argument is ignored, unless the file path is not accessible, or the data is not `data.frame`.
`curation_txt`	`data.frame` whose first column should match the sample column headers found in the PD abundance columns, and subsequent columns contain associated sample annotations. If `curation_txt` is not supplied, then values will be split into columns by `⁠_⁠` underscore or `" "` whitespace characters.
`tx_colname`, `gene_colname`	`character` strings indicating colnames in `tx2gene` that should be used. `tx_colname` represents unique identifier for each transcript, usually `"transcript_id"`. `gene_colname` represents a gene label associated with gene summarized expression values, typically `"gene_name"`.
`geneFeatureType`, `txFeatureType`	`character` arguments passed to `splicejam::makeTx2geneFromGtf()` only when supplying argument `gtf` with a path to a GTF file.
`countsFromAbundance`	`character` string passed to `tximport::summarizeToGene()` to define the method for calculating abundance.
`gene_body_ids`	`character` optional vector with specific row identifiers that should be considered `transcript_type="gene_body"` entries, relevant to argument `import_types` above. When `gene_body_ids` is defined, these entries are used directly without using `tx2gene`. When `gene_body_ids` is not defined, `tx2gene$transcript_type` is used if present. If that column is not present, or does not contain any entries with `"gene_body"`, then all transcripts are used for `import_types="gene"`, and `import_types="gene_body"` is not valid and therefore is not returned.
`trim_tx_from`, `trim_tx_to`	`character` vector with one or more regular expression patterns used to curate the values in `tx_colname` prior to assigning them as `rownames()`, and back to `tx_colname`. These values are joined to `tx2gene[[tx_colname]]` to assign additional gene annotations. The default as of version 0.0.79.900 is to leave strand information `"(-)"` and `"(+)"` without removing it, since in some rare cases a gene's unspliced transcripts can be present on two strands. The previous default (version <= 0.0.78.900) was to remove `"(-)"` and `"(+)"` from the transcript_id `tx_colname` column.
`verbose`	`logical` indicating whether to print verbose output.
`...`	additional arguments are passed to supporting functions.

Details

This function is intended to automate the process of importing a series of quant.sf files, then generating SummarizedExperiment objects at the transcript and gene level. It optionally includes sample annotation provided as a data.frame in argument curation_txt. It also includes transcript and gene annotations through either data.frame from argument tx2gene, or it derives tx2gene from a GTF file from argument gtf. The GTF file option then calls splicejam::makeTx2geneFromGtf().

This function can optionally process data that includes full length gene body regions, annotated with "gene_body". This option is specific for Salmon quantitation where the transcripts include full length gene body for multi-exon genes, for example to measure unspliced transcript abundance.

import_types="gene" summarizes only the proper transcripts, excluding "gene_body" entries.
import_types="gene_body" summarizes all transcript and full gene entries into one summary transcript abundance.
import_types="gene_tx" summarizes proper transcript to gene level, and separately represents "gene_body" entries for comparison.

The current recommendation is to use default values for import_types which imports all the following types of data:

TxSE as transcripts per row
GeneSE as genes per row, excluding unspliced gene_body transcripts.
GeneBodySE as genes per row, summarizing unspliced and spliced "gene_body" transcripts together for each gene.
GeneTxSE as spliced and unspliced genes per row, so that unspliced, spliced, or both can be analyzed together.

We typically use GeneBodySE and define a subset of rownames(GeneBodySE) to use only spliced transcripts during analysis. Optionally we may run limma::diffSplice() to compare the unspliced:spliced ratio across experiment groups.

Value

list with SummarizedExperiment objects, each of which contain assay names ⁠c("counts", "abundance", "length)⁠, where c("counts", "abundance") are transformed with log2(1 + x). The transform can be reversed with 10^x - 1. The SummarizedExperiment objects by name:

"TxSE": transcript-level values imported from quant.sf.
"GeneSE": gene-level summary values, excluding "gene_body" entries.
"GeneBodySE": gene-level summary values, summarizing unspliced and spliced "gene_body" transcripts together for each gene.
"GeneTxSE": gene-level summary values, where transcripts are combined to gene level, and "gene_body" entries are represented separately, with suffix "_gene_body" added to the gene name.

jmw86069/platjam
Platform Jam, biological platform importers.

import_salmon_quant: Import Salmon quant.sf files to SummarizedExperiment
In jmw86069/platjam: Platform Jam, biological platform importers.

Import Salmon quant.sf files to SummarizedExperiment

Description

Usage

Arguments

Details

Value

See Also

Related to import_salmon_quant in jmw86069/platjam...

R Package Documentation

Browse R Packages

We want your feedback!

jmw86069/platjam Platform Jam, biological platform importers.

import_salmon_quant: Import Salmon quant.sf files to SummarizedExperiment In jmw86069/platjam: Platform Jam, biological platform importers.

Import Salmon quant.sf files to SummarizedExperiment

Description

Usage

Arguments

Details

Value

See Also

Related to import_salmon_quant in jmw86069/platjam...

R Package Documentation

Browse R Packages

We want your feedback!

jmw86069/platjam
Platform Jam, biological platform importers.

import_salmon_quant: Import Salmon quant.sf files to SummarizedExperiment
In jmw86069/platjam: Platform Jam, biological platform importers.