TcGSA.LR: Computing the Likelihood Ratios for the Gene Sets under...

Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/TcGSA.LR.R

Description

This function computes the Likelihood Ratios for the gene sets under scrutiny, as well as estimations of genes dynamics inside those gene sets through mixed models.

Usage

1
2
3
4
5
6
7
8
TcGSA.LR(expr, gmt, design, subject_name = "Patient_ID",
  time_name = "TimePoint", crossedRandom = FALSE,
  covariates_fixed = "", time_covariates = "", time_func = "linear",
  group_name = "", separateSubjects = FALSE, minGSsize = 10,
  maxGSsize = 500)

## S3 method for class 'TcGSA'
print(x, ...)

Arguments

expr

a matrix or dataframe of gene expression. Its dimension are nxp, with the p samples in column and the n genes in row.

gmt

a gmt object containing the gene sets definition. See GSA.read.gmt and definition on www.broadinstitute.org.

design

a matrix or dataframe containing the experimental variables that used in the model, namely subject_name, time_name, and covariates_fixed and time_covariates if applicable. Its dimension are pxm and its row are is in the same order as the columns of expr.

subject_name

the name of the factor variable from design that contains the information on the repetition units used in the mixed model, such as the patient identifiers for instance. Default is 'Patient_ID'. See Details.

time_name

the name of a numeric variable from design that contains the information on the time replicates (the time points at which gene expression was measured). Default is 'TimePoint'. See Details.

crossedRandom

logical flag indicating whether the random effects of the subjects and of the time points should be modeled as one crossed random effect or as two separated random effects. Default is FALSE. See details.

covariates_fixed

a character vector with the names of numeric or factor variables from the design matrix that should appear as fixed effects in the model. See details. Default is "", which corresponds to no covariates in the model.

time_covariates

a character vector with the names of numeric or factor variables from the design matrix that should appear as fixed effects interaction with the time_name variable in the model. See details. Default is "", which corresponds to no covariates in the model.

time_func

the form of the time trend. Can be either one of "linear", "cubic", "splines" or specified by the user, or the column name of a factor variable from design. If specified by the user, it must be as an expression using only names of variables from the design matrix with only the three following operators: +, *, / . The "splines" form corresponds to the natural cubic B-splines (see also ns). If there are only a few timepoints, a "linear" form should be sufficient. Otherwise, the "cubic" form is more parsimonious than the "splines" form, and should be sufficiently flexible. If the column name of a factor variable from design is supplied, then time is considered as discrete in the analysis. If the user specify a formula using column names from design, both factor and numeric variables can be used.

group_name

in the case of several treatment groups, the name of a factor variable from the design matrix. It indicates to which treatment group each sample belongs to. Default is "", which means that there is only one treatment group. See Details.

separateSubjects

logical flag indicating that the analysis identifies gene sets that discriminates patients rather than gene sets than have a significant trend over time. Default is FALSE. See Details.

minGSsize

the minimum number of genes in a gene set. If there are less genes than this number in one of the gene sets under scrutiny, the Likelihood Ratio of this gene set is not computed (the mixed model are not fitted). Default is 10 genes as the minimum.

maxGSsize

the maximum number of genes in a gene set. If there are more genes than this number in one of the gene sets under scrutiny, the Likelihood Ratio of this gene set is not computed (the mixed model are not fitted). This is to avoid very long computation times. Default is 500 genes as the maximum.

x

an object of class 'TcGSA'.

...

further arguments passed to or from other methods.

Details

This Time-course Gene Set Analysis aims at identifying gene sets that are not stable over time, either homogeneously or heterogeneously (see Hejblum et al, 2012)in terms of their probes. And when the argument separateSubjects is TRUE, instead of identifying gene sets that have a significant trend over time, TcGSA identifies gene sets that have significantly different trends over time depending on Subjects.

Value

TcGSA.LR returns a tcgsa object, which is a list with the 5 following elements:

Author(s)

Boris P. Hejblum

References

Hejblum BP, Skinner J, Thiebaut R, (2015) Time-Course Gene Set Analysis for Longitudinal Gene Expression Data. PLoS Computat. Biol. 11(6): e1004310. doi: 10.1371/journal.pcbi.1004310

See Also

summary.TcGSA, plot.TcGSA, and TcGSA.LR.parallel for an implementation using parallel computing

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
## Not run:  
data(data_simu_TcGSA)

tcgsa_sim_1grp <- TcGSA.LR(expr=expr_1grp, gmt=gmt_sim, design=design, 
                          subject_name="Patient_ID", time_name="TimePoint",
                          time_func="linear", crossedRandom=FALSE)
tcgsa_sim_1grp
summary(tcgsa_sim_1grp)

plot(x=tcgsa_sim_1grp, expr=expr_1grp, 
    Subject_ID=design$Patient_ID, TimePoint=design$TimePoint,
    baseline=1, 
    B=100,
    time_unit="H"
    )
       
tcgsa_sim_2grp <- TcGSA.LR(expr=expr_2grp, gmt=gmt_sim, design=design,
                          subject_name="Patient_ID", time_name="TimePoint",
                          time_func="linear", crossedRandom=FALSE, 
                          group_name="group.var")
tcgsa_sim_2grp


## End(Not run)

borishejblum/TcGSA documentation built on Oct. 2, 2018, 10:14 a.m.