dcGSA: Perform gene set analysis for longitudinal gene expression...

Description Usage Arguments Value References Examples

View source: R/dcGSA.R

Description

Perform gene set analysis for longitudinal gene expression profiles.

Usage

1
2
dcGSA(data = NULL, geneset = NULL, nperm = 10, c = 0, KeepPerm=FALSE,
  parallel = FALSE, BPparam = MulticoreParam(workers = 4))

Arguments

data

A list with ID (a character vector for subject ID), pheno (a data frame with each column being one clinical outcome), gene (a data frame with each column being one gene).

geneset

A list of gene sets of interests (the output of readGMT function).

nperm

An integer number of permutations performed to get P values.

c

An integer cutoff value for the overlapping number of genes between the data and the gene set.

KeepPerm

A logical value indicating if the permutation statistics are kept. If there are a large number of gene sets and the number of permutation is large, the matrix of the permutation statistics could be large and memory demanding.

parallel

A logical value indicating if parallel computing is wanted.

BPparam

Parameters to configure parallel evaluation environments if parallel is TRUE. The default value is to use 4 cores in a single machine. See BiocParallelParam object in Bioconductor package BiocParallel for more details.

Value

Returns a data frame with following columns, if KeepPerm=FALSE; otherwise, returns a list with two objects: "res" object being the following data frame and "stat" being the permutation statistics.

Geneset

Names for the gene sets.

TotalSize

The original size of each gene set.

OverlapSize

The overlapping number of genes between the data and the gene set.

Stats

Longitudinal distance covariance between the clinical outcomes and the gene set.

NormScore

Only available when permutation is performed. Normalized longitudinal distance covariance using the mean and standard deviation of permutated values.

P.perm

Only available when permutation is performed. Permutation P values.

P.approx

P values obtained using normal distribution to approximate the null distribution.

FDR.approx

FDR based on the P.approx.

References

Distance-correlation based Gene Set Analysis in Longitudinal Studies. Jiehuan Sun, Jose Herazo-Maya, Xiu Huang, Naftali Kaminski, and Hongyu Zhao.

Examples

1
2
3
4
5
data(dcGSAtest)
fpath <- system.file("extdata", "sample.gmt.txt", package="dcGSA")
GS <- readGMT(file=fpath)
system.time(res <- dcGSA(data=dcGSAtest,geneset=GS,nperm=100))
head(res)

Example output

Loading required package: Matrix
   user  system elapsed 
  0.400   0.008   0.414 
                                        Geneset TotalSize OverlapSize    Stats
1                KEGG_PENTOSE_PHOSPHATE_PATHWAY        27          25 1.625002
2 KEGG_PENTOSE_AND_GLUCURONATE_INTERCONVERSIONS        28          17 1.558547
3                     KEGG_STEROID_BIOSYNTHESIS        17          15 1.539620
4          KEGG_FRUCTOSE_AND_MANNOSE_METABOLISM        34          33 1.506377
5                  KEGG_CITRATE_CYCLE_TCA_CYCLE        32          28 1.515866
6        KEGG_ASCORBATE_AND_ALDARATE_METABOLISM        25          15 1.518333
  NormScore     P.perm   P.approx FDR.approx
1 2.0492925 0.02970297 0.02021676  0.2021676
2 1.0621121 0.16831683 0.14409240  0.4254491
3 1.0591146 0.16831683 0.14477380  0.4254491
4 0.9257386 0.16831683 0.17729092  0.4254491
5 0.7380278 0.22772277 0.23024877  0.4254491
6 0.6579989 0.27722772 0.25526943  0.4254491

dcGSA documentation built on Nov. 8, 2020, 7:53 p.m.