GADAG2_CV: Cross-validation for GADAG2
In magalichampion/GADAG: A Genetic Algorithm for learning Directed Acyclic Graphs

Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/GADAG2_CV.R

Function to run k-fold cross-validation for GADAG2 to optimally tune the parameter of penalization.

GADAG2_CV(X, Lambdas = NULL, n.folds = 10, threshold = 0.1,
  GADAG2.control = list(n.gen = 250, tol.Shannon = 1e-06, max.eval =
  1e+07, pop.size = 5 * ncol(X), p.xo = 0.25, p.mut = 0.05),
  grad.control = list(tol.obj.inner = 1e-06, max.ite.inner = 50),
  ncores = 1, plot.CV = 0)

`X`	Design matrix, with samples (n) in rows and variables (p) in columns.
`Lambdas`	Optional user-supplied lambda sequence. Default is null and GADAG2 chooses its own sequence.
`n.folds`	Number of folds for cross-validation (10 by default). Can be as large as the sample size (leave-one-out cross validation) but not recommended for large data sets.
`threshold`	Thresholding value for the estimated edges.
`GADAG2.control`	A list containing parameters for controlling GADAG2 (termination conditions and inherent parameters of the Genetic Algortihm). Some parameters (n.gen, max.eval and pop.size) are particularly critical for reducing the computational time. `n.gen` maximal number of population generations (>0), `pop.size` initial population size for the genetic algorithm (>0), `max.eval` overall maximal number of calls of the evaluation function (>0, should be of the order of `n.gen`*`pop.size`), `tol.Shannon` threshold for the Shannon entropy (>0), `p.xo` crossover probability of the genetic algorithm (between 0 and 1), `p.mut` mutation probability of the genetic algorithm (between 0 and 1).
`grad.control`	A list containing the parameters for controlling the inner optimization, i.e. the gradient descent. `tol.obj.inner` tolerance (>0), `max.ite.inner` maximum number of iterations (>0).
`ncores`	Number of cores (>0, depending on your computer).
`plot.CV`	If 1, plots the averaged cross validation error given sequence of lambdas.

The function runs GADAG2 n.folds times for each of the tested lambdas to compute the best solution associated to each omitted fold. The error is accumulated and the averaged error over the folds is computed. The best lambda lambda.min corresponds to the one that minimizes the error.

A list with the following elements:

lambda.min Value of lambda that minimizes the averaged cross validation error error.CV.
lambda.1se Largest value of lambda such that error.CV is within 1 % of the minimum.
nzero Number of non-zero coefficients at each lambda.
Lambdas The values of lambda used in the fits.
error.CV The averaged cross validation error.

Magali Champion, Victor Picheny and Matthieu Vignes

M. Champion, V. Picheny, M. Vignes, Inferring large graphs using l-1 penalized likelihood, Statistics and Computing (2017).

GADAG2, GADAG2_Run, GADAG2_Analyze.

 #############################################################
 # Loading toy data
 #############################################################
 data(toy_data)
 # toy_data is a list of two matrices corresponding to a "star"
 # DAG (node 1 activates all other nodes):
 # - toy_data$X is a 100x10 design matrix
 # - toy_data$G is the 10x10 adjacency matrix (ground trough)

 #############################################################
 # Tuning the parameter of penalization
 #############################################################
 # Simple run, whithout specifying GADAG2 parameters
 ## Not run: 
 GADAG2_CV_results <- GADAG2_CV(X=toy_data$X)
 print(GADAG2_CV_results$lambda.1se) # best lambda
 
## End(Not run)
 # If desired, additional plot for the averaged cross validation
 # error
 ## Not run: 
 GADAG2_CV_results <- GADAG2_CV(X=toy_data$X,plot.CV=1)
 
## End(Not run)
 # Given the best lambda, re-run GADAG2
 ## Not run: 
 GADAG2_results <- GADAG2_Run(X=toy_data$X, lambda=GADAG2_CV_results$lambda.1se)
 print(GADAG2_results$G.best) # optimal adjacency matrix graph
 
## End(Not run)