FUNNEL.GSEA: FUNNEL-GSEA: FUNctioNal ELastic-net Regression in time-course...

Description Usage Arguments Details Value Author(s) References Examples

Description

This is the main function of this package. It takes gene expression matrix (X), time points (tt), a pre-defined list of gene sets (genesets), and several tuning parameters (see below) as input. It then apply functional data analysis techniques and elastic-net regression to define weights for genes that are shared by more than one gene sets (the overlapping genes). Finally, it applies an extended Mann-Whitney U test that incorporates both intergene correlation and weights computed from the elastic-net regression to pre-defined gene sets to obtain significance of non-constant temporal trend for each gene set.

Usage

1
2
3
FUNNEL.GSEA(X, tt, genesets, lambda=10^-3.5, rr=rep(1,length(tt)),
            selection_k="FVE", FVE_threshold=0.9, nharm=3, centerfns=FALSE,
            equiv.threshold=0.01, lam1=0.4, lam2=0.01, alpha.level=0.05, sort=TRUE)

Arguments

X

A gene expression matrix. Genes must be ordered as rows and each column represent a time point. Row names must be gene names used in genesets. Columns must be ordered by tt.

tt

A vector of ordered unique time points.

genesets

A list of pre-defined gene sets. Each element of this list is a vector of gene names that are rownames of the X matrix. Each element should have gene set name as element name. Note that due to data pre-processsing, many genes in genesets may not be present in rownames(X). Internally, we define a new genesets object that is a collection of the intersection of each pre-defined pathway and rownames(X) for the actual computation.

lambda

Roughness penalty (smoothing) parameter used by function smooth.basis in package fda. Empirical evidences show that the default value works well for a large collection of time-course gene expression data, as long as X is z-transformed and tt is normalized to [0,1]. In this function, we carry out these standardizations automatically.

rr

Number of repetitions at each unique time points.

selection_k

Method of choosing the number of principal components. Choices are: "FVE" (fraction of variance explained), which use scree plot approach to select number of principal components. See FVE_threshold below; a positive integer K: user-specified number of principal components. Default value: "FVE".

FVE_threshold

A positive number between 0 and 1. It is used with the option selection_k = "FVE" to select the number of principal components that explain at least FVE_threshold of total variation. Default value: 0.9.

nharm

Number of eigenfunctions to be used to represent the overall temporal trend of each gene set. Default to 3.

centerfns

An indicator if to center functions in pca.fd(). Default: FALSE.

equiv.threshold

This is a threshold to be used in function equiv.regression. Any eigenvalue that explains less than this proportion of total variance will be discarded. Default to 0.01 (one percent of total variance).

lam1

The L1 (LASSO) penalty used in elastic-net regression.

lam2

The L2 (ridge) penalty used in elastic-ent regression.

alpha.level

Significance level for gene set test (extended Mann-Whitney U test). Default to 0.05.

sort

An indicator if to sort the p-values from the smallest to the largest. Default: TRUE.

Details

Technical details of this procedure is documented in Yun Zhang, Juilee Thakar, Xing Qiu (2016).

Value

A list with the following objects

pvals

A vector of unadjusted p-values for pre-defined gene sets.

weight.list

A list of weights (a.k.a. empirical gene set membership) computed from the elastic-net regression.

correlation

Mean intergene correlation coefficient estimated from genes within each gene set.

sig.genesets

A vector of names (or indices) of significant gene sets at alpha.level (default: 0.05).

Fstats

A vector of F-statistics, which are gene-level summary statistics used in extended Mann-Whitney U test.

Author(s)

Yun Zhang, Juilee Thakar, Xing Qiu

References

Yun Zhang, Juilee Thakar, Xing Qiu (2016) FUNNEL-GSEA: FUNctioNal ELastic-net Regression in Time-course Gene Set Enrichment Analysis, submitted to Bioinformatics.

Examples

1
2
3
4
5
## Load the sample data
data("H3N2-Subj1")

## It takes about 10 minutes to run on my Laptop; YMMV.
## Not run: t1 <- system.time(results1 <- FUNNEL.GSEA(X, tt, genesets=genesets))

Thakar-Lab/FUNNEL-GSEA-R-Package documentation built on May 8, 2019, 9:57 p.m.