import_salmon_quant | R Documentation |
Import Salmon quant.sf files to SummarizedExperiment
import_salmon_quant(
salmonOut_paths,
import_types = c("tx", "gene", "gene_body", "gene_tx"),
gtf = NULL,
tx2gene = NULL,
curation_txt = NULL,
tx_colname = "transcript_id",
gene_colname = "gene_name",
gene_body_colname = "transcript_type",
geneFeatureType = "exon",
txFeatureType = "exon",
countsFromAbundance = "lengthScaledTPM",
gene_body_ids = NULL,
trim_tx_from = NULL,
trim_tx_to = NULL,
verbose = FALSE,
...
)
salmonOut_paths |
|
import_types |
|
gtf |
|
tx2gene |
|
curation_txt |
|
tx_colname , gene_colname |
|
geneFeatureType , txFeatureType |
|
countsFromAbundance |
|
gene_body_ids |
|
trim_tx_from , trim_tx_to |
|
verbose |
|
... |
additional arguments are passed to supporting functions. |
This function is intended to automate the process of importing
a series of quant.sf
files, then generating SummarizedExperiment
objects at the transcript and gene level. It optionally includes
sample annotation provided as a data.frame
in argument curation_txt
.
It also includes transcript and gene annotations through either
data.frame
from argument tx2gene
, or it derives tx2gene
from a GTF file from argument gtf
. The GTF file option then calls
splicejam::makeTx2geneFromGtf()
.
This function can optionally process data that includes full length
gene body regions, annotated with "gene_body"
. This option is specific
for Salmon quantitation where the transcripts include full length
gene body for multi-exon genes, for example to measure unspliced
transcript abundance.
import_types="gene"
summarizes only the proper transcripts,
excluding "gene_body"
entries.
import_types="gene_body"
summarizes all transcript
and full gene entries into one summary transcript abundance.
import_types="gene_tx"
summarizes proper transcript to gene level,
and separately represents "gene_body"
entries for comparison.
The current recommendation is to use default values for import_types
which imports all the following types of data:
TxSE
as transcripts per row
GeneSE
as genes per row, excluding unspliced gene_body
transcripts.
GeneBodySE
as genes per row, summarizing unspliced
and spliced "gene_body"
transcripts together for each gene.
GeneTxSE
as spliced and unspliced genes per row, so that
unspliced, spliced, or both can be analyzed together.
We typically use GeneBodySE
and define a subset of rownames(GeneBodySE)
to use only spliced transcripts during analysis. Optionally we may
run limma::diffSplice()
to compare the unspliced:spliced ratio
across experiment groups.
list
with SummarizedExperiment
objects, each of which
contain assay names c("counts", "abundance", "length)
, where
c("counts", "abundance")
are transformed with log2(1 + x)
.
The transform can be reversed with 10^x - 1
.
The SummarizedExperiment
objects by name:
"TxSE"
: transcript-level values imported from quant.sf
.
"GeneSE"
: gene-level summary values, excluding
"gene_body"
entries.
"GeneBodySE"
: gene-level summary values, summarizing unspliced
and spliced "gene_body"
transcripts together for each gene.
"GeneTxSE"
: gene-level summary values, where transcripts are
combined to gene level, and "gene_body"
entries are represented
separately, with suffix "_gene_body"
added to the gene name.
Other jam import functions:
coverage_matrix2nmat()
,
deepTools_matrix2nmat()
,
frequency_matrix2nmat()
,
import_lipotype_csv()
,
import_metabolomics_niehs()
,
import_nanostring_csv()
,
import_nanostring_rcc()
,
import_nanostring_rlf()
,
import_omics_data()
,
import_proteomics_PD()
,
import_proteomics_mascot()
,
process_metab_compounds_file()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.