sumToGene: Summarize the non-inferential rep data from Salmon to gene...

View source: R/utility_functions.R

sumToGeneR Documentation

Summarize the non-inferential rep data from Salmon to gene level (see details)

Description

Summarize the non-inferential rep data from Salmon to gene level (see details)

Usage

sumToGene(
  QuantSalmon,
  key,
  tx2gene,
  clust = NULL,
  countsFromAbundance,
  GenAllGroupCombos = FALSE
)

Arguments

QuantSalmon

is the Salmon quantification object output using tximport (see file (1)DataProcessing.R in the package's SampleCode folder for example code)

key

is a data.frame with columns "Sample" (corresponding to the unique biological identifier for the analysis), "Condition" (giving the condition/treatment effect variables for the data), and "Identifier", which should be named "Sample1", "Sample2", ... up to the number of rows of key. This "Identifier" needs to be created like this even if the observations don't correspond to unique biological samples.

tx2gene

is a dataframe that matches transcripts to genes. Can be created by maketx2gene.

clust

An optional clust object of class parallel to parallelize within this function. See makeCluster for more information.

countsFromAbundance

character corresponding to the countsFromAbundance parameter used when importing the data with tximport. Possible values are "no", "scaledTPM", or "lengthScaledTPM".

GenAllGroupCombos

is a TRUE/FALSE indicator for generating all possible condition combinations from key$Condition. Only ever needed for certain power analyses, will almost always be set to FALSE.

Value

sumToGene saves initial files from the quantification. These files include lists of gene-specific expression estimates with and with "OtherGroups", which was a filtering alternative we considered in addition to filters built into DRIMSeq. abDatasets correspond to TPM abundances and cntDatasets correspond to counts that may be scaled relative to TPMs if countsFromAdundance is either "scaledTPM" or "lengthScaledTPM".

abGene and cntGene contain the TPM and (possibly scaled) counts with one row per transcript respectively. These also contain additional information that may be useful, including total gene expression (TGE) for each biological sample and total expression added up across different genes, mean and total TGE by condition, relative transcript abundance proportions (RTAs), and information about the major transcript for that gene, which is the most highly expressed transcript for that gene across all samples. See the file (1)DataProcessing.R in the package's SampleCode folder for example code.


skvanburen/CompDTUReg documentation built on Jan. 23, 2025, 9:01 a.m.