Description Usage Arguments Details Value See Also Examples
View source: R/cross_validate.R
Performs K-fold cross-validation in order to select the tuning parameters lambda and gamma.
Recall that the penalty for the LRpS estimator is written as λ_1 ||S||_1 + λ_2 Trace(L) in the
objective function of lrpsadmm
.
This can be equivalently rewritten in terms of the regularisation parameters λ and γ as follows
λ γ ||S||_1 + λ (1 - γ) Trace(L),
for γ \in (0, 1).
For a given value of γ, one can perform cross-validation along the regularisation path in order to choose λ. This function computes the regularisation paths for each value of γ supplied as arguments and performs cross-validation. The pair (λ, γ) that produces the smallest cross-validated log-likelihood is returned.
1 2 3 4 5 6 | lrpsadmm.cv(X, gammas = c(0.05, 0.1, 0.15), covariance.estimator = cor,
n.folds = 5, lambdas = NULL, lambda.max = NULL,
lambda.ratio = 1e-04, n.lambdas = 20, max.sparsity = 0.5,
max.rank = NA, abs_tol = 1e-05, rel_tol = 0.001, max.iter = 2000,
mu = 1, verbose = FALSE, seed = NA, zeros = NULL,
backend = "RcppEigen")
|
X |
n x p data matrix |
gammas |
A real or a vector of reals between 0 and 1 (non inclusive). For each value of gamma the regularisation path is computed and cross-validation is performed. See examples for guidance on how to choose reasonable values for gamma. Too high a value might result in the problem being under-identified and therefore numerical instabilities. |
covariance.estimator |
A function that takes a data matrix and outputs an estimate of its correlation matrix.
Default: the sample correlation matrix output by the |
n.folds |
Number of folds for cross-validation. Default 5. |
lambdas |
A decreasing sequence of values of lambda. See Details for the default values. |
lambda.max |
A positive real. Maximum value of lambda. See Details. |
lambda.ratio |
A real between 0 and 1. The smallest value of lambda is given by lambda.max * lambda.ratio. See Details. |
n.lambdas |
A positive integer. The number of values of lambda to generate according a geometric sequence between lambda.max and lambda.max * lambda.ratio. See Details. |
max.sparsity |
A real between 0 and 1. Abort the computation of the path if S becomes denser than this value. |
max.rank |
A real between 0 and 1. Abort the computuation of the path if the rank of L becomes higher than this value. |
abs_tol |
|
rel_tol |
|
max.iter |
|
mu |
|
verbose |
A boolean. Whether to print the value of lambda, gamma, sparsity of S, etc... after each fit |
seed |
Set the seed of the random number generator used for the K folds. |
zeros |
A p x p matrix with entries set to 0 or 1. Whereever its entries are 0, the entries of the estimated S will be forced to 0. |
backend |
The |
Recall that the penalty for the LRpS estimator is written as λ_1 ||S||_1 + λ_2 Trace(L) in the
objective function of lrpsadmm
.
This can be equivalently rewritten in terms of the regularisation parameters λ and γ as follows
λ γ ||S||_1 + λ (1 - γ) Trace(L),
for γ \in (0, 1).
For a given value of γ, one can perform cross-validation along the regularisation path in order to choose λ. This function computes the regularisation paths for each value of γ supplied as arguments and performs cross-validation. One can then select the pair (λ, γ) that produces the smallest cross-validated log-likelihood.
The function lrpsadmm
is fitted for successive values of λ using warm starts.
The sequence of values of λ can be provided directly by the user.
It is automatically sorted in decreasing order.
By default, a decreasing sequence of 20 values within a reasonable range is selected as follows.
We set λ_{max} = \max_{ij, i \neq j} |Σ_{ij}|/γ and
λ_{min} = λ_{max} * lambda.ratio
; then
20 values between λ_{max} and λ_{min} are taken following
a geometric progression.
Because it does not make much sense to fit this estimator when the sparse estimate S becomes too dense
or if the rank of the low-rank estimate L becomes too high, the computation of the path is aborted
when the sparsity of S reaches max.sparsity
or when the rank of L reaches max.rank
.
Recall that cross-validation overselects. It might be good for prediction purposes but methods such statibility selection are probably better if the support of S is what is of interest.
An object of class lrpsadmmcv. It contains the values of the mean cross-validated log-likelihood, its standard deviation for each pair (lambda, gamma) and an object of class lrpsadmm.path for each value of gamma. See the examples for how to access the selected tuning paramters, best fit etc...
lrpsadmm lrpadmm.path
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 | set.seed(0)
## 1 - A simple example: Well-powered dataset
sim.data <- generate.latent.ggm.data(n=2000, p=100, h=5, outlier.fraction = 0.0,
sparsity = 0.02, sparsity.latent = 0.7)
ground.truth <- sim.data$precision.matrix[1:100, 1:100]
ground.truth <- 1 * (( ground.truth - diag(diag(ground.truth)) ) !=0)
X <- sim.data$obs.data;
gammas <- c(0.1, 0.15, 0.2)
# Let the function decide the range of value of Lambda
cvpath <- lrpsadmm.cv(X, gammas = gammas,
lambda.ratio = 1e-02, n.lambdas = 30, verbose = TRUE)
best.gamma <- cvpath$best.gamma
best.lambda <- cvpath$best.lambda
best.fit <- cvpath$best.fit$fit # Object of class lrpsadmm
plot(best.fit)
# The value Gamma = 0.15 is selected by X-validation
plot(cvpath) # Plot the outcome. Object of class lrpsadmmcv
# We can look at the path corresponding to this value
best.path <- cvpath$cross.validated.paths[[which(gammas == best.gamma)]]$cross.validated.path
# We know the ground truth, so let's use it to see how well we did:
plot(best.path, ground.truth = ground.truth)
## 2 - Data with outliers: use a robust estimator
set.seed(0)
sim.data <- generate.latent.ggm.data(n=2000, p=50, h=5, outlier.fraction = 0.05,
sparsity = 0.02, sparsity.latent = 0.7)
ground.truth <- sim.data$precision.matrix[1:50, 1:50]
# Remove the elements along the diagonal. Keep a matrix of 0s and 1s
ground.truth <- 1 * (( ground.truth - diag(diag(ground.truth)) ) !=0)
X <- sim.data$obs.data;
# We can afford high values of gamma because n >> p
gammas <- c(0.2, 0.3, 0.4)
# Use the Kendall based estimator:
cvpath <- lrpsadmm.cv(X, gammas = gammas, covariance.estimator = Kendall.correlation.estimator,
lambda.ratio = 1e-03, n.lambdas = 30, verbose = TRUE)
plot(cvpath)
best.gamma <- cvpath$best.gamma
best.lambda <- cvpath$best.lambda
best.path <- cvpath$cross.validated.paths[[which(gammas == best.gamma)]]$cross.validated.path
plot(best.path, ground.truth = ground.truth)
## 3 - A tougher problem n is close to p
set.seed(0)
sim.data <- generate.latent.ggm.data(n=150, p=100, h=5, outlier.fraction = 0.0,
sparsity = 0.02, sparsity.latent = 0.7)
ground.truth <- sim.data$precision.matrix[1:100, 1:100]
# Remove the elements along the diagonal. Keep a matrix of 0s and 1s
ground.truth <- 1 * (( ground.truth - diag(diag(ground.truth)) ) !=0)
X <- sim.data$obs.data;
# Since n < p, do not try too high values of gamma. Stay close to 0.
gammas <- c(0.07, 0.1, 0.12)
cvpath <- lrpsadmm.cv(X, gammas = gammas,
lambda.ratio = 0.1, n.lambdas = 20, verbose = TRUE)
plot(cvpath) # Plot the outcome
# Clearly the range selected by the function is not good enough
# We need better values for lambda. In that case it is better
# to see which value of lambda yields a very sparse graph
# for a given gamma:
gammas <- c(0.1)
# We set the seed so we can compre the x-validated log-likelihoods between
# two runs of the function
cvpath <- lrpsadmm.cv(X, gammas = gammas, lambda.max = 2.2,
lambda.ratio = 0.1, n.lambdas = 20, verbose = TRUE, seed = 0)
plot(cvpath)
gammas <- c(0.12)
cvpath <- lrpsadmm.cv(X, gammas = gammas, lambda.max = 1.7,
lambda.ratio = 0.1, n.lambdas = 20, verbose = TRUE, seed = 0)
plot(cvpath)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.