sumToGene: Summarize the non-inferential rep data from Salmon to gene...
In skvanburen/CompDTUReg: CompDTUReg: Fit Compositional Regression Models for DTU

sumToGene

R Documentation

Summarize the non-inferential rep data from Salmon to gene level (see details)

Description

Summarize the non-inferential rep data from Salmon to gene level (see details)

Usage

sumToGene(
  QuantSalmon,
  key,
  tx2gene,
  clust = NULL,
  countsFromAbundance,
  GenAllGroupCombos = FALSE
)

Arguments

`QuantSalmon`	is the Salmon quantification object output using tximport (see file (1)DataProcessing.R in the package's SampleCode folder for example code)
`key`	is a data.frame with columns "Sample" (corresponding to the unique biological identifier for the analysis), "Condition" (giving the condition/treatment effect variables for the data), and "Identifier", which should be named "Sample1", "Sample2", ... up to the number of rows of key. This "Identifier" needs to be created like this even if the observations don't correspond to unique biological samples.
`tx2gene`	is a dataframe that matches transcripts to genes. Can be created by `maketx2gene`.
`clust`	An optional clust object of class parallel to parallelize within this function. See `makeCluster` for more information.
`countsFromAbundance`	character corresponding to the countsFromAbundance parameter used when importing the data with `tximport`. Possible values are `"no"`, `"scaledTPM"`, or `"lengthScaledTPM"`.
`GenAllGroupCombos`	is a TRUE/FALSE indicator for generating all possible condition combinations from key$Condition. Only ever needed for certain power analyses, will almost always be set to FALSE.

Value

sumToGene saves initial files from the quantification. These files include lists of gene-specific expression estimates with and with "OtherGroups", which was a filtering alternative we considered in addition to filters built into DRIMSeq. abDatasets correspond to TPM abundances and cntDatasets correspond to counts that may be scaled relative to TPMs if countsFromAdundance is either "scaledTPM" or "lengthScaledTPM".

abGene and cntGene contain the TPM and (possibly scaled) counts with one row per transcript respectively. These also contain additional information that may be useful, including total gene expression (TGE) for each biological sample and total expression added up across different genes, mean and total TGE by condition, relative transcript abundance proportions (RTAs), and information about the major transcript for that gene, which is the most highly expressed transcript for that gene across all samples. See the file (1)DataProcessing.R in the package's SampleCode folder for example code.

skvanburen/CompDTUReg documentation built on Jan. 23, 2025, 9:01 a.m.