Calculate Variance Inflation Factor
A function to calculate the Variance Inflation Factor (VIF) for each of the gene sets in the geneResults object
1 2 3
An objet of class ExpressionSet containing log normalized expression data (as created by the affy and lumi packages), OR a matrix of log2(expression values). This must be the same dataset that was used to create geneResults
The method used to calculate variance. See the description for more details.
Boolean parameter determining whether to use all data in eset to calculate the VIF, or to only use data from the groups being contrasted. Only used if useCAMERA is set to FALSE
This method calculates the Variance Inflation Factor (VIF) for each gene set in geneSets, which is used to correct for the correlation of genes in the gene set. This method builds off of a technique proposed by Wu et al. (Nucleic Acids Res, 2012), which calculates the VIF for each gene set based on the correlation of the genes in that set. The Wu et al. method, referred to as CAMERA, uses the linear model framework created by LIMMA to calculate gene-gene correlations, but consequently it must assume equal variance not only between all groups in the dataset, but also across each gene in the gene set. While this assumption leads to a slightly more computationally efficient VIF calculation, it is not valid for most gene sets, and its violation can greatly impact specificity.
This function provides two options for calculating the VIF: the CAMERA method established by Wu et al. (if useCAMERA is
TRUE), or an alternative implementation of the VIF calculation (if useCAMERA is
FALSE) which does not assume equal variance of individual groups or genes. By default,
calcVIF will choose useCAMERA based on the options specified in makeComparison. If
var.equal was set to
TRUE, then by default the variance will be calculated using CAMERA.
If the internal VIF calculation is used (i.e.
useCAMERA=FALSE), the parameter useAllData can be specified to determine which samples in eset should be used to calculate the VIF. By default (
useAllData=TRUE), all of the samples in eset will be used to calculate the VIF. If
useAllData=FALSE, only the samples in eset which were used to generate geneResults will be included in the calculation. Generally, using all data will provide a more accurate esimate of the gene-gene correlations, but if the samples in eset are from very different conditions (e.g. different tissues or platforms), it may make more sense to limit the VIF calculation to a subset of samples.
A version of geneResults with VIF added into the object.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27
##create example data eset = matrix(rnorm(500*20),500,20, dimnames=list(1:500,1:20)) labels = c(rep("A",10),rep("B",10)) ##a few of the genes are made to be strongly correlated corGenes = t(apply(eset[1:30,],1,sort)) eset[1:30,] = corGenes[,sample(1:ncol(eset))] ##genes 1:60 are differentially expressed eset[1:60, labels=="B"] = eset[1:60, labels=="B"] + 1 geneSets = list(cor.set=1:30, random.set=31:60) ##Run qusage geneResults = makeComparison(eset, labels, "B-A") set.results = aggregateGeneSet(geneResults, geneSets) ##calc VIF for gene sets set.results = calcVIF(eset, set.results) ##Look at results with and without VIF par(mfrow=c(1,2)) plotDensityCurves(set.results, addVIF=FALSE, col=1:2, main="No VIF") plotDensityCurves(set.results, addVIF=TRUE, col=1:2, main="With VIF") legend("topleft",legend=names(geneSets),col=1:2, lty=1)
Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.