A function to calculate the Variance Inflation Factor (VIF) for each of the gene sets in the `geneResults` object

1 2 3 |

`eset` |
An objet of class ExpressionSet containing log normalized expression data (as created by the affy and lumi packages), OR a matrix of log2(expression values). This must be the same dataset that was used to create geneResults |

`geneResults` |
A |

`useCAMERA` |
The method used to calculate variance. See the description for more details. |

`useAllData` |
Boolean parameter determining whether to use all data in eset to calculate the VIF, or to only use data from the groups being contrasted. Only used if useCAMERA is set to FALSE |

This method calculates the Variance Inflation Factor (VIF) for each gene set in `geneSets`, which is used to correct for the correlation of genes in the gene set. This method builds off of a technique proposed by Wu et al. (Nucleic Acids Res, 2012), which calculates the VIF for each gene set based on the correlation of the genes in that set. The Wu et al. method, referred to as CAMERA, uses the linear model framework created by LIMMA to calculate gene-gene correlations, but consequently it must assume equal variance not only between all groups in the dataset, but also across each gene in the gene set. While this assumption leads to a slightly more computationally efficient VIF calculation, it is not valid for most gene sets, and its violation can greatly impact specificity.

This function provides two options for calculating the VIF: the CAMERA method established by Wu et al. (if `useCAMERA` is `TRUE`

), or an alternative implementation of the VIF calculation (if `useCAMERA` is `FALSE`

) which does not assume equal variance of individual groups or genes. By default, `calcVIF`

will choose `useCAMERA` based on the options specified in makeComparison. If `var.equal`

was set to `TRUE`

, then by default the variance will be calculated using CAMERA.

If the internal VIF calculation is used (i.e. `useCAMERA=FALSE`

), the parameter `useAllData` can be specified to determine which samples in `eset` should be used to calculate the VIF. By default (`useAllData=TRUE`

), all of the samples in `eset` will be used to calculate the VIF. If `useAllData=FALSE`

, only the samples in `eset` which were used to generate `geneResults` will be included in the calculation. Generally, using all data will provide a more accurate esimate of the gene-gene correlations, but if the samples in `eset` are from very different conditions (e.g. different tissues or platforms), it may make more sense to limit the VIF calculation to a subset of samples.

A version of `geneResults` with VIF added into the object.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 | ```
##create example data
eset = matrix(rnorm(500*20),500,20, dimnames=list(1:500,1:20))
labels = c(rep("A",10),rep("B",10))
##a few of the genes are made to be strongly correlated
corGenes = t(apply(eset[1:30,],1,sort))
eset[1:30,] = corGenes[,sample(1:ncol(eset))]
##genes 1:60 are differentially expressed
eset[1:60, labels=="B"] = eset[1:60, labels=="B"] + 1
geneSets = list(cor.set=1:30, random.set=31:60)
##Run qusage
geneResults = makeComparison(eset, labels, "B-A")
set.results = aggregateGeneSet(geneResults, geneSets)
##calc VIF for gene sets
set.results = calcVIF(eset, set.results)
##Look at results with and without VIF
par(mfrow=c(1,2))
plotDensityCurves(set.results, addVIF=FALSE, col=1:2, main="No VIF")
plotDensityCurves(set.results, addVIF=TRUE, col=1:2, main="With VIF")
legend("topleft",legend=names(geneSets),col=1:2, lty=1)
``` |

Questions? Problems? Suggestions? Tweet to @rdrrHQ or email at ian@mutexlabs.com.

All documentation is copyright its authors; we didn't write any of that.