copa: Calculate COPA Scores from a Set of Microarrays

Description Usage Arguments Details Value Author(s) References Examples

View source: R/copa.R

Description

This function calculates COPA scores from a set of microarrays. Input can be an ExpressionSet, or a matrix or data.frame.

Usage

1
copa(object, cl, cutoff = 5, max.overlap = 0, norm.count = 0, pct = 0.95)

Arguments

object

An ExpressionSet, or a matrix or data.frame.

cl

A vector of classlabels indicating sample status (normal = 1, tumor = 2).

cutoff

The cutoff to determine 'outlier' status. See details for more information.

max.overlap

The maximum number of samples that can be considered 'outliers' when comparing two genes. The default is 0, indicating that there can be no overlap. See details for more information.

norm.count

The number of normal samples that can be considered 'outliers'. The default is 0, meaning that no normals may be outliers.

pct

The percentile to use for pre-filtering the data. A preliminary step is to compute the number of outlier samples for each gene. All genes with a number of outlier samples less than the (default 95th) percentile will be removed from further consideration.

Details

Cancer Outlier Profile Analysis is a method that is intended to find pairs of genes that may be involved in recurrent gene fusion with a third (unknown) gene. The underlying idea here is that in certain cancers it may be common for the promoter region of one gene to become fused to certain oncogenes. For instance, Tomlins et. al. showed that the promoter region of TMPRSS2 fused to either ERG or ETV1 in the majority of prostate cancer tumors tested.

Since this fusion should only happen with one oncogene in a given sample, we look for pairs of genes where some samples have much higher expression values, but the samples for gene 'A' are mutually exclusive from the samples for gene 'B'.

The cutoff argument for this function is used to determine how high the centered and scaled expression value has to be in order to be considered an outlier. The max.overlap argument allows one to relax the requirement of mutual exclusivity, although in practice this is probably not advisable.

Note that this function computes all row-wise comparisons, which gets very large very quickly. The function will throw a warning for any data set containing > 1000 rows and query the user to see if he/she really wants to proceed. The number of genes to be considered can be adjusted by increasing/decreasing the 'pct' argument.

Value

ord.prs

A matrix with two columns containing the ordered row numbers from the original matrix of gene expression values.

pr.sums

A numeric vector with the number of mutually exclusive outliers for each gene pair. This is the criterion for ranking the gene pairs; the assumption being that a pair of genes with more mutually exclusive outliers will be more interesting than a pair with relatively fewer outliers.

mat

A matrix containing the filtered gene expression values.

cl

The classlabel vector passed to copa

cutoff

The cutoff used

max.overlap

The value of max.overlap used

norm.count

The value of norm.count used

pct

The percentile used in the pre-filtering step

Author(s)

James W. MacDonald

References

Tomlins, SA, et al. Recurrent fusion of TMPRSS2 and ETS transcription factor genes in prostate cancer. Science. 2005 Oct 28;310(5748):644-8.

Examples

1
2
3
4
library(Biobase)
data(sample.ExpressionSet)
cl <- abs(3 - as.numeric(pData(sample.ExpressionSet)[,2]))
tmp <- copa(sample.ExpressionSet, cl)

Example output

Loading required package: Biobase
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, sd, var, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colMeans, colSums, colnames, do.call,
    duplicated, eval, evalq, get, grep, grepl, intersect, is.unsorted,
    lapply, lengths, mapply, match, mget, order, paste, pmax, pmax.int,
    pmin, pmin.int, rank, rbind, rowMeans, rowSums, rownames, sapply,
    setdiff, sort, table, tapply, union, unique, unsplit, which,
    which.max, which.min

Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

copa documentation built on Nov. 8, 2020, 7:47 p.m.