SIMoNe algorithm for network inference

Description

The simone function offers an interface to infer networks based on partial correlation coefficients in various contexts and methods (steady-state data, time-course data, multiple sample setup, clustering prior)

Usage

1
2
3
4
5
simone(X,
       type       = "steady-state",
       clustering = FALSE,
       tasks      = factor(rep(1, nrow(X))),
       control    = setOptions())

Arguments

X

a n x p matrix of data, typically n expression levels associated to the same p genes. Can also be a data.frame with n entries, each column corresponding to a variable (a gene). Specifying colnames to X may be convenient in view of results analysis, since it will be used to annotate the plots. Note that this is the only required argument.

type

a character string indicating the data specification (either "steady-state" or "time-course" data). Default is "steady-state".

clustering

a logical indicating if the network inference should be perfomed by penalizing the edges according to a latent clustering discovered during the network structure recovery. Default is FALSE.

tasks

A factor with n entries indicating the task belonging for each observation in the multiple sample framework. Default is factor(rep(1, nrow(X))), that is, all observations come from a unique homogeneous sample.

control

A list that is used to specify low-level options for the algorithm, defined through the setOptions function.

Details

Any inference method available ("neighborhood selection", "graphical-Lasso", "VAR(1) inference" and "multitask learning" - see simone-package) relies on an optimization problem under the general form

Θhat (λ) = argmaxΘ L(Θ; data) - λ * penl1(Θ, Z),
where L is the log-likelihood of the model (pseudo log-likelihood for "neighborhood selection") and λ is a penalty parameter which controls the sparsity level of the network. The p x p matrix Θ describes the parameters (basically, the edges) of the model, while Z represents a latent clustering which is also estimated when the argument clustering is set to TRUE.

The model and the penalty function penl1 differ according to the context (steady-state/time-course data, multitask learning and its associated coupling effect). For further details on the models, please check the papers listed in the reference section of simone-package.

The criterion displayed during a SIMoNe run is the value of the penalized likelihood for the current values of the estimor Θhat(λ) corresponding to a given value of the overall penalty level λ.

The following information criteria are also computed for any value of λ and part of the output of simone. The BIC (Bayesian Information Criterion)

BIC(λ) = L(Θhat(λ); data) - df(Θhat(λ)) log(n)/2,

and the AIC (Akaike Information Criterion)

AIC(λ) = L(Θhat(λ); data) - df(Θhat(λ)) .

Value

Returns an object of class simone, which is list-like and contains the following:

networks

a list with all the inferred networks stocked as adjacency matrices (the successive values of Θ controled by the penalty level λ). In the multiple sample setup, each element of the list is a list with as many entries as samples or levels in tasks.

penalties

a vector of the same length as networks, containing the successive values of the penalty level.

n.edges

a vector of the same length as networks, containing the successive numbers of edges in the inferred networks. In the multiple sample setup, n.edges is a matrix with as many columns as levels in tasks.

BIC

a vector of the same length as networks, containing the value of the BIC for the successively estimated networks.

AIC

a vector of the same length as networks, containing the value of the AIC for the successively estimated networks.

clusters

a size-p factor indicating the class of each variable.

weights

a pxp matrix of weigths used to adapt the penalty to each entry of the Theta matrix. It is inferred through the algorithm according to the latent clustering of the network. When clustering is set to FALSE, all the weights are equal to "1", which mean no adaptive penalization.

control

a list describing all the posterior values of the parameters used by the algorithm, to compare with the one set by the setOptions function. As a matter of fact, many of the options are defined depending on the nature of the data and can be automatically corrected during internal checks of the coherence of desired options to the characteristics of the data.

Note

If nothing particular is specified about the penalty through the control list (see setOptions), the default is to start from a value of λ that ensures an empty network. Then λ is progressively shrinked, as close to zero as possible. Along the shrinkage of λ, only networks with different numbers of edges are kept in the final output.

Author(s)

J. Chiquet

See Also

setOptions, plot.simone, cancer and demo(package="simone").

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
## load the breast cancer data set
data(cancer)
attach(cancer)

## launch simone with the default parameters and plot results
plot(simone(expr))

## try with clustering now (clustering is achieved on a 30-edges network)
plot(simone(expr, clustering=TRUE, control=setOptions(clusters.crit=30)))

## Not run: 
## try the multiple sample
plot(simone(expr, tasks=status))

## End(Not run)

detach(cancer)