Calculate Variance Inflation Factor

Share:

Description

A function to calculate the Variance Inflation Factor (VIF) for each of the gene sets in the geneResults object

Usage

1
2
3
calcVIF(eset, geneResults, useCAMERA = geneResults$var.method=="Pooled", 
        useAllData = TRUE)
 

Arguments

eset

An objet of class ExpressionSet containing log normalized expression data (as created by the affy and lumi packages), OR a matrix of log2(expression values). This must be the same dataset that was used to create geneResults

geneResults

A QSarray object, as generated by either makeComparison or aggregateGeneSet

useCAMERA

The method used to calculate variance. See the description for more details.

useAllData

Boolean parameter determining whether to use all data in eset to calculate the VIF, or to only use data from the groups being contrasted. Only used if useCAMERA is set to FALSE

Details

This method calculates the Variance Inflation Factor (VIF) for each gene set in geneSets, which is used to correct for the correlation of genes in the gene set. This method builds off of a technique proposed by Wu et al. (Nucleic Acids Res, 2012), which calculates the VIF for each gene set based on the correlation of the genes in that set. The Wu et al. method, referred to as CAMERA, uses the linear model framework created by LIMMA to calculate gene-gene correlations, but consequently it must assume equal variance not only between all groups in the dataset, but also across each gene in the gene set. While this assumption leads to a slightly more computationally efficient VIF calculation, it is not valid for most gene sets, and its violation can greatly impact specificity.

This function provides two options for calculating the VIF: the CAMERA method established by Wu et al. (if useCAMERA is TRUE), or an alternative implementation of the VIF calculation (if useCAMERA is FALSE) which does not assume equal variance of individual groups or genes. By default, calcVIF will choose useCAMERA based on the options specified in makeComparison. If var.equal was set to TRUE, then by default the variance will be calculated using CAMERA.

If the internal VIF calculation is used (i.e. useCAMERA=FALSE), the parameter useAllData can be specified to determine which samples in eset should be used to calculate the VIF. By default (useAllData=TRUE), all of the samples in eset will be used to calculate the VIF. If useAllData=FALSE, only the samples in eset which were used to generate geneResults will be included in the calculation. Generally, using all data will provide a more accurate esimate of the gene-gene correlations, but if the samples in eset are from very different conditions (e.g. different tissues or platforms), it may make more sense to limit the VIF calculation to a subset of samples.

Value

A version of geneResults with VIF added into the object.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
 
  ##create example data
  eset = matrix(rnorm(500*20),500,20, dimnames=list(1:500,1:20))
  labels = c(rep("A",10),rep("B",10))
  
  ##a few of the genes are made to be strongly correlated
  corGenes = t(apply(eset[1:30,],1,sort))
  eset[1:30,] = corGenes[,sample(1:ncol(eset))]
  
  ##genes 1:60 are differentially expressed
  eset[1:60, labels=="B"] = eset[1:60, labels=="B"] + 1
  geneSets = list(cor.set=1:30, random.set=31:60)
  
  ##Run qusage
  geneResults = makeComparison(eset, labels, "B-A")
  set.results = aggregateGeneSet(geneResults, geneSets)
  
  ##calc VIF for gene sets
  set.results = calcVIF(eset, set.results)
 
  ##Look at results with and without VIF
  par(mfrow=c(1,2))
  plotDensityCurves(set.results, addVIF=FALSE, col=1:2, main="No VIF")
  plotDensityCurves(set.results, addVIF=TRUE, col=1:2, main="With VIF")
  legend("topleft",legend=names(geneSets),col=1:2, lty=1)