mGSZ: Gene set analysis based on Gene Set Z-scoring function and...

Description Usage Arguments Details Value Author(s) References Examples

View source: R/mGSZ.R

Description

Gene set analysis based on Gene Set Z scoring function and asymptotic p-value

Usage

1
mGSZ(expr.data, gene.sets, sample.labels, flip.gene.sets = FALSE, select= 'T-score', is.log = TRUE, gene.perm = FALSE,  min.cl.sz=5, other.methods = FALSE, pre.var = 0, wgt1 = 0.2, wgt2 = 0.5, var.constant = 10, start.val=5, perm.number = 200)

Arguments

expr.data

Gene expression data matrix (rows as genes and columns as samples)

gene.sets

Gene set data (dataframe/table/matrix/list)

sample.labels

Vector of response values (example:1,2)

flip.gene.sets

TRUE if gene set data is list with genes as list names

select

Gene level statistics (example: T-score/FC/P-value)

is.log

TRUE for log fold change as gene level statistics

gene.perm

TRUE for analysis with both gene and sample permutation data as the null distributions

min.cl.sz

Minimum size of gene sets (number of genes in a gene set) to be included in the analysis

other.methods

TRUE for gene set analysis with other methods (see the manuscript for details)

pre.var

Estimate of the variance associated with each observation

wgt1

Weight 1, parameter used to calculate the prior variance obtained with class size var.constant. This penalizes especially small classes and small subsets. Default is 0.2. Values around 0.1 - 0.5 are expected to be reasonable.

wgt2

Weight 2, parameter used to calculate the prior variance obtained with the same class size as that of the analyzed class. This penalizes small subsets from the gene list. Default is 0.5. Values around 0.3 and 0.5 are expected to be reasonable

var.constant

Size of the reference class used with wgt1. Default is 10

start.val

Number of first genes to be left out from the analysis

perm.number

Number of permutations for p-value calculation

Details

A function for Gene set analysis based on Gene Set Z-scoring function and asymptotic p- value. It differs from GSZ (Toronen et al 2009) in that it implements asymptotic p-values instead of empirical p-values. Asymptotic p-values are based on fitting suitable distribution model to the permutation data. Unlike empirical p-values, the resolution of asymptotic p-values are independent of the number of permutations and hence requires consideralbly fewer permutations. In addition to GSZ, this function allows the users to carry out analysis with seven other scoring functions (visit http://ekhidna.biocenter.helsinki.fi/downloads/pashupati/mGSZ.html for a more detailed description) and compare the results.

Value

mGSZ

Dataframe with gene sets (in decreasing order based on the significance) reported by mGSZ method and their sizes, scores, p-values and gene set expression summary

mGSA

Dataframe with gene sets (in decreasing order based on the significance) reported by mGSA method and their sizes, scores, p-values and gene set expression summary

mAllez

Dataframe with gene sets (in decreasing order based on the significance) reported by mAllez method and their sizes, scores, p-values and gene set expression summary

WRS

Dataframe with gene sets (in decreasing order based on the significance) reported by WRS method and their sizes, scores, p-values and gene set expression summary

SUM

Dataframe with gene sets (in decreasing order based on the significance) reported by SUM method and their sizes, scores, p-values and gene set expression summary

SS

Dataframe with gene sets (in decreasing order based on the significance) reported by SS method and their sizes, scores, p-values and gene set expression summary

KS

Dataframe with gene sets (in decreasing order based on the significance) reported by KS method and their sizes, scores, p-values and gene set expression summary

wKS

Dataframe with gene sets (in decreasing order based on the significance) reported by wKS method and their sizes, scores, p-values and gene set expression summary

sample.labels

Vector of response values used

perm.number

Number of permutations used for p-value calculation

expr.data

For internal use

gene.sets

For internal use

flip.gene.sets

For internal use

min.cl.sz

For internal use

other.methods

For internal use

pre.var

For internal use

wgt1

For internal use

wgt2

For internal use

var.constant

For internal use

start.val

For internal use

select

For internal use

is.log

For internal use

gene.perm.log

For internal use

Author(s)

Pashupati Mishra, Petri Toronen

References

Mishra Pashupati, Toronen Petri, Leino Yrjo, Holm Liisa. Gene Set Analysis: Limitations in popular existing methods and proposed improvements (Not yet published) http://ekhidna.biocenter.helsinki.fi/downloads/pashupati/mGSZ.html

Toronen, P., Ojala, P. J., Marttinen, P., and Holm, L. (2009). Robust extraction of functional signals from gene set analysis using a generalized threshold free scoring function. BMC Bioinformatics, 10(1), 307.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
gene.names <- paste("g",1:1000, sep = "")

# create random gene expression data matrix

set.seed(100)
expr.data <- matrix(rnorm(1000*50),ncol=50)
rownames(expr.data) <- gene.names
b <- matrix(2*rnorm(2500),ncol=25)
ind <- sample(1:100,replace=FALSE)
expr.data[ind,26:50] <- expr.data[ind,26:50] + b

sample.labels <- rep(1:2,c(25,25))

# create random gene sets

gene.sets <- vector("list", 100)
for(i in 1:length(gene.sets)){
	gene.sets[[i]] <- sample(gene.names, size = 20)
}
names(gene.sets) <- paste("set", as.character(1:100), sep="")

mGSZ.obj <- mGSZ(expr.data, gene.sets, sample.labels, perm.number = 100)
top.mGSZ.sets <- toTable(mGSZ.obj, no.top.sets = 10) 

# scoring function profile data across the ordered gene list for top 5 gene sets

data4plot <- StabPlotData(mGSZ.obj,rank.vector=c(1,2,3,4,5))

# profile plot for the top gene set

plotProfile(data4plot,1)  

# gene sets in a gmt format can be converted to mGSZ readable format as follows:
# gene.sets <- geneSetsList("gene.sets.gmt")

mGSZ documentation built on May 2, 2019, 5:53 p.m.

Related to mGSZ in mGSZ...