UPC_RNASeq_Single: Apply Universal exPression Codes (UPC) transformation to a...

Description Usage Arguments Value Note Author(s) References

View source: R/RNASeq.R

Description

This function is used to derive UPC values for a single RNA-Seq sample. It requires an input vector that specifies a read count for each genomic region (e.g., gene). Optionally, this function can correct for the GC content and length of each genomic region.

Usage

1
2
3
UPC_RNASeq_Single(expressionValues, featureNames, lengths = NULL,
  gcContent = NULL, modelType = "nn", convThreshold = 0.01,
  ignoreZeroes = FALSE, verbose = TRUE)

Arguments

expressionValues

A vector of RNA-Seq count values. Required.

featureNames

A vector of unique names that correspond to the count values. Required.

lengths

A vector indicating the length (in genomic bases) of the genomic region that corresponds to the count values.

gcContent

A vector indicating the number of G/C bases in the genomic region that corresponds to the count values.

modelType

Various models can be used for the mixture model to differentiate between active and inactive probes. The default is the normal-normal model (“nn”), which uses the normal distribution. Other available options are log-normal (“ln”), negative-binomial (“nb”), and normal-normal Bayes (“nn_bayes”).

convThreshold

Convergence threshold that determines at what point the mixture-model parameters have stabilized. The default value should be suitable in most cases. However, if the model fails to converge (or converges too quickly), it may be useful to adjust this value. (This parameter is optional.)

ignoreZeroes

Whether to ignore read counts equal to zero when performing UPC calculations. Default is FALSE.

verbose

Whether to output more detailed status information as files are normalized. Default is TRUE.

Value

A vector that contains a UPC value for each probeset/gene/transcript.

Note

RNA-Seq data by nature have a lot of zero read counts. Samples with an excessive number of zeroes may lead to error messages because genes cannot be allocated properly to bins. The user can specify ignoreZeroes=TRUE to avoid this error. In practice, we have seen that the resulting UPC values are similar with either approach.

The modelType parameter indicates which type of mixture model to use for UPC transformation. The "nn_bayes" model type is an experimental new approach intended for experiments where a subset of genes are expressed at extreme levels.

Author(s)

Stephen R. Piccolo

References

Piccolo SR, Withers MR, Francis OE, Bild AH and Johnson WE. Multi-platform single-sample estimates of transcriptional activation. Proceedings of the National Academy of Sciences of the United States of America, 2013, 110:44 17778-17783.


SCAN.UPC documentation built on Nov. 8, 2020, 11:10 p.m.