View source: R/vanilla_search.R
vanilla_search | R Documentation |
vanilla_search
conducts a cross-validated grid search of the parameters for a given data matrix mat
, PCP function pcp_func
, and settings of parameters to search through grid
. See the Methods section below for more details.
vanilla_search( mat, pcp_func, grid, scale_func = NULL, parallel_approach = "multisession", cores = parallel::detectCores(logical = F), perc_test = 0.15, runs = 1, conserve_memory = FALSE, verbose = TRUE, save_as = NULL, ... )
mat |
The data matrix to conduct the grid search on. |
pcp_func |
The PCP function to use when grid searching. Note: the PCP function passed must be able to handle missing |
grid |
A dataframe with dimension N x P containing the N-many settings of P-many parameters to try. The columns of |
scale_func |
(Optional) The function used to scale the input |
parallel_approach |
(Optional) The computational approach used when conducting the gridsearch (to be passed on to the |
cores |
(Optional) The number of cores to use when parallelizing the grid search. By default, |
perc_test |
(Optional) The fraction of entries of |
runs |
(Optional) The number of times to test a given parameter setting. By default, |
conserve_memory |
(Optional) A logical indicating if you only care about the actual statistics of the gridsearch and would therefore like to conserve memory when running the gridsearch. If set to |
verbose |
(Optional) A logical indicating if you would like verbose output displayed or not. By default, |
save_as |
(Optional) A character containing the root of the file path used to save the output to. Importantly, this should not end in any file extension, since this character will be used to save both the resulting |
... |
Any parameters required by |
A list containing the following:
all_stats
A data.frame
containing the statistics of every run comprising the grid search. These statistics include the parameter settings for the run, along with the run
number (used as the seed in the corruption step outlined in step 1 of the Methods section), the relative error for the run rel_err
, the rank of the recovered L matrix L_rank
, the sparsity of the recovered S matrix S_sparsity
, the number of iterations
PCP took to reach convergence, and the error status run_error
of the PCP run (NA
if no error, otherwise a character).
summary_stats
A data.frame
containing a summary of the information in all_stats
. Made to easily pass on to print_gs
.
L_mats
A list containing all the L matrices returned from PCP throughout the gridsearch. Therefore, length(L_mats) == nrow(all_stats)
. Row i in all_stats
corresponds to L_mats[[i]]
. Only returned when conserve_memory = FALSE
.
S_mats
A list containing all the S matrices returned from PCP throughout the gridsearch. Therefore, length(S_mats) == nrow(all_stats)
. Row i in all_stats
corresponds to S_mats[[i]]
. Only returned when conserve_memory = FALSE
.
test_mats
A list of length(runs)
containing all the corrupted test mats (and their masks) used throughout the gridsearch. Note: all_stats$run[i]
corresponds to test_mats[[i]]
. Only returned when conserve_memory = FALSE
.
original_mat
The original data matrix mat
after it was column scaled by scale_func
. Only returned when conserve_memory = FALSE
.
constant_params
A copy of the constant parameters that were originally passed to the gridsearch (for record keeping).
Each hyperparameter setting is cross-validated by:
Randomly corrupting perc_test
percent of the entries in mat
as missing (i.e. NA
values), yielding corrupted_mat
.
Done via the corrupt_mat_randomly
function.
Running the PCP function (pcp_func
) on corrupted_mat
, giving L_hat
and S_hat
.
Recording the relative recovery errors of L_hat
compared with the input data matrix mat
for only those values that were imputed as missing during the corruption step. Ie. ||P_OmegaCompliment(mat - L_hat)||_F / ||P_OmegaCompliment(mat)||_F
.
Repeating steps 1-3 for a total of runs
many times.
Reporting the mean of the runs
-many runs for each parameter setting.
Older versions of PCP's gridsearch (not recommended): grid_search_cv
, random_search_cv
, bayes_search_cv
, and print_gs
library(pcpr) # since we will be passing \code{vanilla_search} a PCP function # simulate a data matrix: n <- 50 p <- 10 data <- sim_data(sim_seed = 1, nrow = n, ncol = p, rank = 3, sigma=0, add_sparse = FALSE) mat <- data$M # pick parameter settings of lambda and mu to try: lambdas <- c(1/sqrt(n), 1.25/sqrt(n), 1.5/sqrt(n)) mus <- c(sqrt(p/2), sqrt(p/1.5), sqrt(p/1.25)) param_grid <- expand.grid(lambda = lambdas, mu = mus) # run the grid search: search_results <- vanilla_search(mat, pcp_func = root_pcp_na, grid_df = param_grid, cores = 4, perc_b = 0.2, runs = 20, verbose = TRUE, save_as = NULL) # visualize the output: print_gs2(search_results$summary_stats)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.