GADAG2_CV: Cross-validation for GADAG2

Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/GADAG2_CV.R

Description

Function to run k-fold cross-validation for GADAG2 to optimally tune the parameter of penalization.

Usage

1
2
3
4
5
GADAG2_CV(X, Lambdas = NULL, n.folds = 10, threshold = 0.1,
  GADAG2.control = list(n.gen = 250, tol.Shannon = 1e-06, max.eval =
  1e+07, pop.size = 5 * ncol(X), p.xo = 0.25, p.mut = 0.05),
  grad.control = list(tol.obj.inner = 1e-06, max.ite.inner = 50),
  ncores = 1, plot.CV = 0)

Arguments

X

Design matrix, with samples (n) in rows and variables (p) in columns.

Lambdas

Optional user-supplied lambda sequence. Default is null and GADAG2 chooses its own sequence.

n.folds

Number of folds for cross-validation (10 by default). Can be as large as the sample size (leave-one-out cross validation) but not recommended for large data sets.

threshold

Thresholding value for the estimated edges.

GADAG2.control

A list containing parameters for controlling GADAG2 (termination conditions and inherent parameters of the Genetic Algortihm). Some parameters (n.gen, max.eval and pop.size) are particularly critical for reducing the computational time.

  • n.gen maximal number of population generations (>0),

  • pop.size initial population size for the genetic algorithm (>0),

  • max.eval overall maximal number of calls of the evaluation function (>0, should be of the order of n.gen*pop.size),

  • tol.Shannon threshold for the Shannon entropy (>0),

  • p.xo crossover probability of the genetic algorithm (between 0 and 1),

  • p.mut mutation probability of the genetic algorithm (between 0 and 1).

grad.control

A list containing the parameters for controlling the inner optimization, i.e. the gradient descent.

  • tol.obj.inner tolerance (>0),

  • max.ite.inner maximum number of iterations (>0).

ncores

Number of cores (>0, depending on your computer).

plot.CV

If 1, plots the averaged cross validation error given sequence of lambdas.

Details

The function runs GADAG2 n.folds times for each of the tested lambdas to compute the best solution associated to each omitted fold. The error is accumulated and the averaged error over the folds is computed. The best lambda lambda.min corresponds to the one that minimizes the error.

Value

A list with the following elements:

Author(s)

Magali Champion, Victor Picheny and Matthieu Vignes

References

M. Champion, V. Picheny, M. Vignes, Inferring large graphs using l-1 penalized likelihood, Statistics and Computing (2017).

See Also

GADAG2, GADAG2_Run, GADAG2_Analyze.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
 #############################################################
 # Loading toy data
 #############################################################
 data(toy_data)
 # toy_data is a list of two matrices corresponding to a "star"
 # DAG (node 1 activates all other nodes):
 # - toy_data$X is a 100x10 design matrix
 # - toy_data$G is the 10x10 adjacency matrix (ground trough)

 #############################################################
 # Tuning the parameter of penalization
 #############################################################
 # Simple run, whithout specifying GADAG2 parameters
 ## Not run: 
 GADAG2_CV_results <- GADAG2_CV(X=toy_data$X)
 print(GADAG2_CV_results$lambda.1se) # best lambda
 
## End(Not run)
 # If desired, additional plot for the averaged cross validation
 # error
 ## Not run: 
 GADAG2_CV_results <- GADAG2_CV(X=toy_data$X,plot.CV=1)
 
## End(Not run)
 # Given the best lambda, re-run GADAG2
 ## Not run: 
 GADAG2_results <- GADAG2_Run(X=toy_data$X, lambda=GADAG2_CV_results$lambda.1se)
 print(GADAG2_results$G.best) # optimal adjacency matrix graph
 
## End(Not run)

magalichampion/GADAG documentation built on May 21, 2019, 11:04 a.m.