matchCGHcall2ExpressionSet: Genomic location matching of CN and GE data
In sigaR: Statistics for Integrative Genomics Analyses in R

Description Usage Arguments Details Value Warning Note Author(s) References See Also Examples

View source: R/dataManagementFunctions.r

Integrative CN-GE analysis requires the copy number data of all genes on the expression array to be available. intCNGEan.match matches the features of the copy number platform to the genes of the expression array. This is done using their genomic locations on the basis of either proximity or overlap.

1
2
3

matchCGHcall2ExpressionSet(CNdata, GEdata, CNchr, CNbpstart, 
CNbpend, GEchr, GEbpstart, GEbpend, method = "distance", 
reference=1, ncpus=1, verbose=TRUE)

`CNdata`	Object of class `cghCall`, containing (among others) annotion and call probabilities.
`GEdata`	Object of class `ExpressionSet`.
`CNchr`	Column in the slot `featureData` of the `cghCall`-object specifying the chromosome information of the features.
`CNbpstart`	Column in the slot `featureData` of the `cghCall`-object specifying the start basepair information of the features.
`CNbpend`	Column in the slot `featureData` of the `cghCall`-object specifying the end basepair information of the features.
`GEchr`	Column in the slot `featureData` of the `ExpressionSet`-object specifying the chromosome information of the features.
`GEbpstart`	Column in the slot `featureData` of the `ExpressionSet`-object specifying the start basepair information of the features.
`GEbpend`	Column in the slot `featureData` of the `ExpressionSet`-object specifying the end basepair information of the features.
`method`	Matching method to be applied, either `"distance"`, `"overlap"` or `"overlapPlus"`. See below for details.
`reference`	Platform that is taken as a reference in the calculation of the percentage, should equal 1 or two, referring to the platform.
`ncpus`	Number of cpus to be used in the computation.
`verbose`	Logical indicator: should intermediate output be printed on the screen?

Ideally full annotation information (chromosome number, start base pair, end base pair) for both copy number and gene expression data is available. In case only start base pair information is available, let CNbpend and GEbpend refer to the same columns as CNbpstart and GEbpstart. Base pair information of copy number and expression data should be on the same scale.

Matching occurs on the basis of genomic locations. In case method="distance", the midpoint of CN and GE features are calculated and for each gene on the expression array the closest feature of the copy number platform is selected. If method="overlap", each gene in the ExpressionSet-object is matched to the feature from the copy number platform with the maximum percentage of overlap. If the maximum percentage of overlap equals zero, the gene is not included in the matched objects. If method="overlapPlus", the features are first matched by their percentage of overlap (as with the method="overlap"-option). For all non-matched GE features its closest two CN features (one down- and one upstream) are determined. If the copy number signature of these two CN features is identical, intrapolation seems reasonable, and and the GE feature is matched to the closest of these two CN features. Hence, method="overlapPlus" makes use of the copy number data, consequently, matching may be different for different data sets.

A two-column matrix with the matched features entries. The first column contains feature numbers of the cghCall-object. The second column contains feature numbers of the ExpressionSet-object. Each row thus has two entries. The first entry contains the feature number of the cghCall-object that has been matched to second entry, representing the feature number of the ExpressionSet-object.

Features with incomplete annotation information are removed before matching. For clarity, they are not included in the objects with matched features.

The matching process implemented here is different from the one implemented in the (depreciated) ACEit-package (Van Wieringen et al., 2006).

Wessel N. van Wieringen: w.vanwieringen@vumc.nl

Van Wieringen, W.N., Belien, J.A.M., Vosse, S.J., Achame, E.M., Ylstra, B. (2006), "ACE-it: a tool for genome-wide integration of gene dosage and RNA expression data", Bioinformatics, 22(15), 1919-1920.

Van Wieringen, W.N., Unger, K., Leday, G.G.R., Krijgsman, O., De Menezes, R.X., Ylstra, B., Van de Wiel, M.A. (2012), "Matching of array CGH and gene expression microarray features for the purpose of integrative analysis", BMC Bioinformatics, 13:80.

cghCall, ExpressionSet

# load data
data(pollackCN16) 
data(pollackGE16) 

# match features from both platforms
featureMatch <- matchCGHcall2ExpressionSet(pollackCN16, pollackGE16, 1, 2, 3, 1, 2, 3)