UPC_TwoColor: Universal exPression Codes (UPC) for two-channel microarrays

Description Usage Arguments Value Note Author(s) References Examples

View source: R/TwoColor.R


This function is used to normalize two-channel expression microarrays (from Agilent) using the Universal exPression Codes (UPC) approach. In raw form, such microarray data come in the form of tab-separate data files.


UPC_TwoColor(inFilePattern, outFilePath = NA, modelType="nn",
  convThreshold=0.01, batchFilePath = NA, verbose = TRUE)



Absolute or relative path to the input file to be processed. To process multiple files, wildcard characters can be used (e.g., "*.txt"). Alternatively, a Gene Expression Omnibus identifier (e.g., GSE39655 or GSM1072833) can be specified.(This is the only required parameter.)


Absolute or relative path where the output file will be saved. (This parameter is optional.)


Various models can be used for the mixture model to differentiate between active and inactive probes. The default is the normal-normal model (“nn”), which uses the normal distribution. Other available options are log-normal (“ln”), negative-binomial (“nb”), and normal-normal Bayes (“nn_bayes”).


Convergence threshold that determines at what point the mixture-model parameters have stabilized. The default value should be suitable in most cases. However, if the model fails to converge, it may be useful to adjust this value. Optional.


Absolute or relative path to a tab-separated text file that indicates batch (and optionally, covariate information) for each sample. Optional.


Whether to output more detailed status information as files are processed. Default is TRUE.


A list is returned, containing two elements: a matrix containing UPC values and a vector of probe names that correspond to each row of the matrix. The matrix will contain two columns—one corresponding to each channel—for each sample. When the array design uses duplicate probe names (this is common for control probes), the vector of probe names will also contain duplicates.


If a Gene Expression Omnibus (GEO) identifier is specified for the inFilePattern parameter, an attempt will be made to download the sample(s) directly from GEO. If a study identifier (e.g., GSE39655) is specified, all CEL files from that study will be downloaded. If a sample identifier (e.g., GSM1072833) is specified, only that sample will be downloaded.

The batchFilePath parameter provides a convenient way to adjust the data for batch effects. It invokes the ComBat function within the sva package. Please see that package for additional details about how batch adjusting is performed. Batch adjusting is performed before UPC transformation occurs.

The modelType parameter indicates which type of mixture model to use for UPC transformation. The "nn_bayes" model type is an experimental new approach intended for experiments where a subset of genes are expressed at extreme levels.


Stephen R. Piccolo


Piccolo SR, Withers MR, Francis OE, Bild AH and Johnson WE. Multi-platform single-sample estimates of transcriptional activation. Proceedings of the National Academy of Sciences of the United States of America, 2013, 110:44 17778-17783.


## Not run: 
# Normalize a file from GEO and save output to a file
result = UPC_TwoColor("GSM1072833", "output_file.txt")

## End(Not run)

SCAN.UPC documentation built on Jan. 5, 2019, 6:38 p.m.