Determines the significance of pre-defined sets of genes with respect to an outcome variable, such as a group indicator, quantitative variable or survival time. This is the basic function called by GSA.

1 2 3 4 5 6 7 8 | ```
GSA.func(x,y, genesets, genenames,geneset.names=NULL,
method=c("maxmean","mean","absmean"), resp.type=c("Quantitative",
"Two class unpaired","Survival","Multiclass", "Two class paired", "tCorr", "taCorr" ),
censoring.status=NULL,
first.time = TRUE, return.gene.ind = TRUE,
ngenes = NULL, gs.mat =NULL, gs.ind = NULL,
catalog = NULL, catalog.unique =NULL,
s0 = NULL, s0.perc = NULL, minsize = 15, maxsize= 500, restand = TRUE, restand.basis=c("catalog","data"))
``` |

`x` |
Data x: p by n matrix of features, one observation per column (missing values allowed) |

`y` |
Vector of response values: 1,2 for two class problem, or 1,2,3 ... for multiclass problem, or real numbers for quantitative or survival problems |

`genesets` |
Gene set collection (a list) |

`genenames` |
Vector of genenames in expression dataset |

`geneset.names` |
Optional vector of gene set names |

`method` |
Method for summarizing a gene set: "maxmean" (default), "mean" or "absmean" |

`resp.type` |
Problem type: "quantitative" for a continuous parameter; "Two class unpaired" ; "Survival" for censored survival outcome; "Multiclass" : more than 2 groups; "Two class paired" for paired outcomes, coded -1,1 (first pair), -2,2 (second pair), etc |

`censoring.status` |
Vector of censoring status values for survival problems, 1 mean death or failure, 0 means censored) |

`first.time` |
internal use |

`return.gene.ind` |
internal use |

`ngenes` |
internal use |

`gs.mat` |
internal use |

`gs.ind` |
internal use |

`catalog` |
internal use |

`catalog.unique` |
internal use |

`s0` |
Exchangeability factor for denominator of test statistic; Default is automatic choice |

`s0.perc` |
Percentile of standard deviation values to use for s0; default is automatic choice; -1 means s0=0 (different from s0.perc=0, meaning s0=zeroeth percentile of standard deviation values= min of sd values |

`minsize` |
Minimum number of genes in genesets to be considered |

`maxsize` |
Maximum number of genes in genesets to be considered |

`restand` |
Should restandardization be done? Default TRUE |

`restand.basis` |
What should be used to do the restandardization? The set of genes in the genesets ("catalog", the default) or the genes in the data set ("data") |

Carries out a Gene set analysis, computing the gene set scores. This function does not do any permutations for estimation of false discovery rates. GSA calls this function to estimate FDRs.

A list with components

`scores` |
Gene set scores for each gene set |

,

`norm.scores` |
Gene set scores transformed by the inverse Gaussian cdf |

,

`mean` |
Means of gene expression values for each sample |

`sd` |
Standard deviation of gene expression values for each sample |

`gene.ind` |
List indicating whch genes in each positive gene set had positive individual scores, and similarly for negative gene sets |

`geneset.names` |
Names of the gene sets |

`nperms` |
Number of permutations used |

`gene.scores` |
Individual gene scores (eg t-statistics for two class problem) |

`s0` |
Computed exchangeability factor |

`s0.perc` |
Computed percentile of standard deviation values |

`stand.info` |
Information computed used in the restandardization process |

`method` |
Method used (from call to GSA.func) |

`call` |
The call to GSA |

Robert Tibshirani

Efron, B. and Tibshirani, R. On testing the significance of sets of genes. Stanford tech report rep 2006. http://www-stat.stanford.edu/~tibs/ftp/GSA.pdf

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 | ```
######### two class unpaired comparison
# y must take values 1,2
set.seed(100)
x<-matrix(rnorm(1000*20),ncol=20)
dd<-sample(1:1000,size=100)
u<-matrix(2*rnorm(100),ncol=10,nrow=100)
x[dd,11:20]<-x[dd,11:20]+u
y<-c(rep(1,10),rep(2,10))
genenames=paste("g",1:1000,sep="")
#create some random gene sets
genesets=vector("list",50)
for(i in 1:50){
genesets[[i]]=paste("g",sample(1:1000,size=30),sep="")
}
geneset.names=paste("set",as.character(1:50),sep="")
GSA.func.obj<-GSA.func(x,y, genenames=genenames, genesets=genesets, resp.type="Two class unpaired")
#to use "real" gene set collection, we read it in from a gmt file:
#
# geneset.obj<- GSA.read.gmt("file.gmt")
#
# where file.gmt is a gene set collection from GSEA collection or
# or the website http://www-stat.stanford.edu/~tibs/GSA, or one
# that you have created yourself. Then
# GSA.func.obj<-GSA.func(x,y, genenames=genenames, genesets=geneset.obj$genesets, resp.type="Two class unpaired")
#
#
``` |

Questions? Problems? Suggestions? Tweet to @rdrrHQ or email at ian@mutexlabs.com.

All documentation is copyright its authors; we didn't write any of that.