GIS: calculate gene influential scores of genes in a gene set.

Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/GIS.R

Description

Calculate the gene influential score of individual feature to the overall variance of GS score. Using a leave-one-out procedure (See detail).

Usage

1
	GIS(x, geneSet, nf=NA, barcol=NA, topN=NA, plot=TRUE, Fvalue=FALSE, ff=NA, cor=FALSE)

Arguments

x

An object of class mgsa-class.

geneSet

A charater string or number to indicated the gene sets under conserderation.

nf

The number of PCs used in the caluculation of gene set scores. The default is NA, which means using all the PCs in the mogsa. This should work for most of the cases.

barcol

The color of the bars, which is used to distinguish features/genes from different datasets, so its length should be the same as the number of data sets.

topN

An positive integer specify the number of top influencers that should to returned.

plot

A logical indicate if the result should be plotted.

Fvalue

A logical indicate if the GIS should be calculated in a supervised manner.

ff

The vector indicates the group of columns for calculating the F-ratio when Fvalue=TRUE.

cor

A logical indicates whether use correlation between reconstructed expression with GSS. This is faster than the standard GIS.

Details

The evaluation of the importance of a single feature is calculated in the supervised or unsupervised manner.

In the unsupervise manner, the value is calculated by:

log2(var(GS_-i)/var(GS))

where GS is the gene set score, and the GS_-i is a recalculate of gene set score without i'th feature. var() is the variance.

In the supervised manner, the value is caluclated as the F-ratio over a class vector:

log2(F(GS_-i)/F(GS))

Where F() is the calculation of F-ratio. The unsupervised GIS is encouraged since it works better for most of the cases in practice.

Value

An object of class data.frame contains three columns. The first column is the feature name, the second columns is the gene influential score. The third columns indicates from where the feature/gene is selected.

Author(s)

Chen Meng

References

TBA

See Also

see annotate.gs

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
	# library(mogsa)
	# loading gene expression data and supplementary data
	data(NCI60_4array_supdata)
	data(NCI60_4arrays)
	mgsa <- mogsa(x = NCI60_4arrays, sup=NCI60_4array_supdata, nf=9,
	              proc.row = "center_ssq1", w.data = "inertia", statis = TRUE)
	allgs <- colnames(NCI60_4array_supdata[[1]])

	# unsupervised measurement
	GIS(mgsa, allgs[1], topN = 5)

	# supervised measurement
	tissueType <- as.factor(sapply(strsplit(colnames(NCI60_4arrays$agilent), split="\\."), "[", 1))
	GIS(mgsa, allgs[1], topN = 5, Fvalue = TRUE, ff = tissueType)
	# more PCs to calcualte
	GIS(mgsa, allgs[1], nf = 20, topN = 5, Fvalue = TRUE, ff = tissueType)

Example output

  feature      GIS     data
1   MYO1C 1.012768   hgu133
2   PALLD 1.012174 hgu133p2
3  RETSAT 1.011508   hgu133
4    ANLN 1.011359   hgu133
5   DERL2 1.011304  agilent
  feature       GIS     data
1  ANAPC4 1.0000000 hgu133p2
2  REPIN1 0.9753993 hgu133p2
3  GTF3C5 0.9162151  agilent
4    IMP3 0.8400354 hgu133p2
5   H2AFY 0.8388857    hgu95
  feature       GIS     data
1  REPIN1 1.0000000 hgu133p2
2  GTF3C5 0.8995660  agilent
3    ACN9 0.8842652   hgu133
4    IMP3 0.8695248 hgu133p2
5  DNAJA3 0.8693984   hgu133

mogsa documentation built on Nov. 8, 2020, 5:41 p.m.