View source: R/random_search_cv.R
random_search_cv | R Documentation |
random_search_cv
conducts a cross-validated randomized search of the
parameters for a given PCP function, given a data matrix mat
and parameter settings to search through.
See the Methods section below for more details.
random_search_cv( mat, pcp_func, grid_df, n_evals, cores = NULL, perc_b = 0.2, runs = 100, seed = NULL, progress_bar = TRUE, file = NULL, ... )
mat |
The data matrix to conduct the grid search on. |
pcp_func |
The PCP function to use when grid searching. Note: the PCP function passed must be able to handle missing |
grid_df |
A dataframe with dimension N x P containing the N-many settings of P-many parameters to try. The columns of grid_df should be named exactly as they are in the function
header of |
n_evals |
The number of parameter settings in |
cores |
The number of cores to use when parallelizing the grid search. If |
perc_b |
The percent of entries of the matrix |
runs |
The number of times to test a given parameter setting. By default, |
seed |
The seed used when randomly selecting parameter settings to evaluate. By default, |
progress_bar |
An optional logical indicating if you would like a progress bar displayed or not. By default, |
file |
An optional character containing the file path used to save the output in. Should end in " |
... |
Any parameters required by |
A list containing the following:
raw
a data.frame
containing the raw statistics of each run comprising the grid search.
These statistics include the parameter settings for the run,
the random seed
used for the corruption step outlined in step 1 of the Methods section below,
the relative error for the run, the rank of the recovered L matrix, the sparsity of the recovered S matrix,
and the number of iterations PCP took to reach convergence (20,000 = Did not converge as of PCPhelpers v. 0.3.1).
formatted
A data.frame
containing the summary of the grid search.
Made to easily pass on to print_gs
.
constants
A list containing those arguments initially passed as constant values when calling random_search_cv
.
Each hyperparameter setting is cross-validated by:
Randomly corrupting perc_b
percent of the entries in mat
as missing (i.e. NA
values), yielding corrupted_mat
.
Done via the corrupt_mat_randomly
function.
Running a PCP function (pcp_func
) on corrupted_mat
, giving L_hat
and S_hat
.
Recording the relative recovery errors of L_hat + S_hat
compared with the raw input data matrix for only those values that were imputed as missing during the corruption step.
Repeating steps 1-3 for a total of runs
many times.
Reporting the mean of the runs
-many runs for each parameter setting.
grid_search_cv
, bayes_search_cv
, and print_gs
library(pcpr) # since we will be passing grid_search_cv a PCP function # simulate a data matrix: n <- 50 p <- 10 data <- sim_data(sim_seed = 1, nrow = n, ncol = p, rank = 3, sigma=0, add_sparse = FALSE) mat <- data$M # pick parameter settings of lambda and mu to try: lambdas <- c(1/sqrt(n), 1.25/sqrt(n), 1.5/sqrt(n)) mus <- c(sqrt(p/2), sqrt(p/1.5), sqrt(p/1.25)) param_grid <- expand.grid(lambda = lambdas, mu = mus) # run the grid search: param_grid.out <- random_search_cv(mat, pcp_func = root_pcp_na, grid_df = param_grid, n_evals = 4, cores = 4, perc_b = 0.2, runs = 20, seed = 1, progress_bar = TRUE, file = NULL) # visualize the output: print_gs(param_grid.out$formatted)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.