# GADAG_Run: Run GADAG In GADAG: A Genetic Algorithm for Learning Directed Acyclic Graphs

## Description

Function to run GADAG, an algorithm that aims at inferring large sparse directed acyclic graphs based on an observation sample X, by minimizing the penalized negative log-likelihood with a convex program embedded in a genetic algorithm.

## Usage

 ```1 2 3 4``` ```GADAG_Run(X, lambda, threshold = 0.1, GADAG.control = list(n.gen = 100, tol.Shannon = 1e-06, max.eval = 10000, pop.size = 10, p.xo = 0.25, p.mut = 0.05), grad.control = list(tol.obj.inner = 1e-06, max.ite.inner = 50), ncores = 1, print.level = 0, return.level = 0) ```

## Arguments

 `X` Design matrix, with samples (n) in rows and variables (p) in columns. `lambda` Parameter of penalization (>0). `threshold` Thresholding value for the estimated edges. `GADAG.control` A list containing parameters for controlling GADAG (termination conditions and inherent parameters of the Genetic Algortihm). Some parameters (n.gen, max.eval and pop.size) are particularly critical for reducing the computational time. `n.gen` maximal number of population generations (>0), `pop.size` initial population size for the genetic algorithm (>0), `max.eval` overall maximal number of calls of the evaluation function (>0, should be of the order of `n.gen`*`pop.size`), `tol.Shannon` threshold for the Shannon entropy (>0), `p.xo` crossover probability of the genetic algorithm (between 0 and 1), `p.mut` mutation probability of the genetic algorithm (between 0 and 1). `grad.control` A list containing the parameters for controlling the inner optimization, i.e. the gradient descent. `tol.obj.inner` tolerance (>0), `max.ite.inner` maximum number of iterations (>0). `ncores` Number of cores (>0, depending on your computer). `print.level` 0 no print, 1 some info on the genetic algorithm behaviour are printed. `return.level` 0 only best solution is returned, 1 evolution of the current best solution and statistics on the population fitness values are also returned.

## Details

This function returns as a primary output `G.best`, the adjacency matrix of the inferred graph. This matrix is computed thanks to its decomposition (`P.best`, `T.best`).

The values of the inputs `n.gen`, `max.eval` and `pop.size` largely influence the algorithm inference capability, but also its computational cost. As a rule-of-thumb, we recommend setting `pop.size` between 1 to 10 times the number of nodes, and `n.gen` between 10 to 100 times `pop.size`. `tol.Shannon` may be decreased in case of premature stop. The other parameters should only be modified with care.

## Value

A list with the following elements:

• `f.best` Best fitness value.

• `P.best` Best node order (vector of length p).

• `T.best` Corresponding best edges values (vector of length p).

• `G.best` Best graph (matrix form).

• `f.best.evol` Evolution of the best fitness value across the iterations (if return.level=1).

• `P.best.evol` Evolution of the best node order across the iterations (if return.level=1).

• `T.best.evol` Evolution of the best edges values across the iterations (if return.level=1).

• `fmin.evol` Evolution of the minimal fitness value of the population across the iterations (if return.level=1).

• `fmean.evol` Evolution of the averaged fitness value of the population across the iterations (if return.level=1).

• `fp10.evol` Evolution of the quantiles of the fitness value across the iterations (if return.level=1).

• `fp90.evol` Evolution of the quantiles of the fitness value across the iterations (if return.level=1).

• `Shannon.evol` Evolution of the Shannon entropy of the population across the iterations (if return.level=1).

## Author(s)

Magali Champion, Victor Picheny and Matthieu Vignes

## See Also

`GADAG`, `GADAG_Run`, `GADAG_Analyze`.

## Examples

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51``` ``` ############################################################# # Loading toy data ############################################################# data(toy_data) # toy_data is a list of two matrices corresponding to a "star" # DAG (node 1 activates all other nodes): # - toy_data\$X is a 100x10 design matrix # - toy_data\$G is the 10x10 adjacency matrix (ground trough) ############################################################# # Running GADAG ############################################################# # Simple run, with only the penalty term specified GADAG_results <- GADAG_Run(X=toy_data\$X, lambda=0.1) print(GADAG_results\$G.best) # optimal adjacency matrix graph # Expensive run with many evaluations if we refine the # termination conditions ## Not run: n.gen <- 1e10 # we allow a very large number of iterations tol.Shannon <- 1e-10 # the entropy of Shannon of the population # has to be very small pop.size <- 5*ncol(toy_data\$G) # this is usually a good # population size max.eval <- n.gen * pop.size # maximal number of nested # evaluation GADAG_results <- GADAG_Run(X=toy_data\$X, lambda=0.1, GADAG.control=list(n.gen=n.gen, tol.Shannon=tol.Shannon, pop.size = pop.size, max.eval=max.eval)) print(GADAG_results\$G.best) # optimal adjacency matrix graph ## End(Not run) # Expensive run if we also increase the population size ## Not run: pop.size <- 10*ncol(toy_data\$G) GADAG_results <- GADAG_Run(X=toy_data\$X, lambda=0.1, GADAG.control=list(pop.size=pop.size)) print(GADAG_results\$G.best) # optimal adjacency matrix graph ## End(Not run) # You can have more information about the evolution of the # algorithm by turning return.level on ## Not run: return.level <- 1 GADAG_results <- GADAG_Run(X=toy_data\$X, lambda=0.1, return.level = return.level) print(GADAG_results\$f.best.evol) # this shows the evolution of the fitness # across the iterations ## End(Not run) ```

GADAG documentation built on May 2, 2019, 3:25 p.m.