vanilla_search: Conducts a cross-validated grid search of the parameters for...
In Columbia-PRIME/PCPhelpers: Provides a bunch of functions to help with PCP experimentation.

vanilla_search

R Documentation

Conducts a cross-validated grid search of the parameters for Principle Component Pursuit (PCP).

Description

vanilla_search conducts a cross-validated grid search of the parameters for a given data matrix mat, PCP function pcp_func, and settings of parameters to search through grid. See the Methods section below for more details.

Usage

vanilla_search(
  mat,
  pcp_func,
  grid,
  scale_func = NULL,
  parallel_approach = "multisession",
  cores = parallel::detectCores(logical = F),
  perc_test = 0.15,
  runs = 1,
  conserve_memory = FALSE,
  verbose = TRUE,
  save_as = NULL,
  ...
)

Arguments

`mat`	The data matrix to conduct the grid search on.
`pcp_func`	The PCP function to use when grid searching. Note: the PCP function passed must be able to handle missing `NA` values. For example: `root_pcp_na`.
`grid`	A dataframe with dimension N x P containing the N-many settings of P-many parameters to try. The columns of `grid` should be named exactly as they are in the function header of `pcp_func`. For example, if `pcp_func = root_pcp_noncvx_na`, then the columns of `grid` should be named "lambda", "mu", and "r" An optional additional column named "rel_err" can be included that contains the mean relative error recorded by that row's parameter setting, with those rows (settings) that have not been tried left as `NA`. In this way, you can perform a grid search in which you already know the relative errors of some parameter settings, but would like to expand your knowledge of the unexplored parts of the grid further.
`scale_func`	(Optional) The function used to scale the input `mat` by column. By default, `scale_func = NULL`, and no scaling is to be done at all.
`parallel_approach`	(Optional) The computational approach used when conducting the gridsearch (to be passed on to the `future` package's `plan` function). Must be one of: `"sequential", "multisession", "multicore"`. By default, `parallel_approach = "multisession"`, which does parallelization via sockets (in separate R sessions) and works on any operating system. If `parallel_approach = "sequential"` then the search will be conducted in serial. The option `parallel_approach = "multicore"` is not supported on Windows machines, nor in RStudio (must be run from the command line) but is faster than the `"multisession"` approach since it runs separate forked R processes.
`cores`	(Optional) The number of cores to use when parallelizing the grid search. By default, `cores = parallel::detectCores(logical = F)`, which is the number of physical CPUs available on the machine.
`perc_test`	(Optional) The fraction of entries of `mat` that will be randomly imputed as `NA` missing values (the test set). Can be anthing in the range `[0, 1)`. By default, `perc_test = 0.15`.
`runs`	(Optional) The number of times to test a given parameter setting. By default, `runs = 1`.
`conserve_memory`	(Optional) A logical indicating if you only care about the actual statistics of the gridsearch and would therefore like to conserve memory when running the gridsearch. If set to `TRUE`, then only statistics on the parameters tested will be returned. By default, `conserve_memory = FALSE`, in which case additional objects saving the outputs of all runs of `pcp_func` will also be returned.
`verbose`	(Optional) A logical indicating if you would like verbose output displayed or not. By default, `verbose = TRUE`.
`save_as`	(Optional) A character containing the root of the file path used to save the output to. Importantly, this should not end in any file extension, since this character will be used to save both the resulting `[save_as].rds` and [save_as]_README.txt files. By default, `save_as = NULL`, in which case the gridsearch is not saved to any file.
`...`	Any parameters required by `pcp_func` that could not be specified in `grid`. Importantly, these parameters are therefore kept constant (not involved in the grid search). The best example is the `LOD` parameter for those PCP functions that require the `LOD` argument.

Value

A list containing the following:

all_stats: A data.frame containing the statistics of every run comprising the grid search. These statistics include the parameter settings for the run, along with the run number (used as the seed in the corruption step outlined in step 1 of the Methods section), the relative error for the run rel_err, the rank of the recovered L matrix L_rank, the sparsity of the recovered S matrix S_sparsity, the number of iterations PCP took to reach convergence, and the error status run_error of the PCP run (NA if no error, otherwise a character).
summary_stats: A data.frame containing a summary of the information in all_stats. Made to easily pass on to print_gs.
L_mats: A list containing all the L matrices returned from PCP throughout the gridsearch. Therefore, length(L_mats) == nrow(all_stats). Row i in all_stats corresponds to L_mats[[i]]. Only returned when conserve_memory = FALSE.
S_mats: A list containing all the S matrices returned from PCP throughout the gridsearch. Therefore, length(S_mats) == nrow(all_stats). Row i in all_stats corresponds to S_mats[[i]]. Only returned when conserve_memory = FALSE.
test_mats: A list of length(runs) containing all the corrupted test mats (and their masks) used throughout the gridsearch. Note: all_stats$run[i] corresponds to test_mats[[i]]. Only returned when conserve_memory = FALSE.
original_mat: The original data matrix mat after it was column scaled by scale_func. Only returned when conserve_memory = FALSE.
constant_params: A copy of the constant parameters that were originally passed to the gridsearch (for record keeping).

Methods

Each hyperparameter setting is cross-validated by:

Randomly corrupting perc_test percent of the entries in mat as missing (i.e. NA values), yielding corrupted_mat. Done via the corrupt_mat_randomly function.
Running the PCP function (pcp_func) on corrupted_mat, giving L_hat and S_hat.
Recording the relative recovery errors of L_hat compared with the input data matrix mat for only those values that were imputed as missing during the corruption step. Ie. ||P_OmegaCompliment(mat - L_hat)||_F / ||P_OmegaCompliment(mat)||_F.
Repeating steps 1-3 for a total of runs many times.
Reporting the mean of the runs-many runs for each parameter setting.

Examples


library(pcpr) # since we will be passing \code{vanilla_search} a PCP function 

# simulate a data matrix:

n <- 50
p <- 10
data <- sim_data(sim_seed = 1, nrow = n, ncol = p, rank = 3, sigma=0, add_sparse = FALSE)
mat <- data$M

# pick parameter settings of lambda and mu to try:

lambdas <- c(1/sqrt(n), 1.25/sqrt(n), 1.5/sqrt(n))
mus <- c(sqrt(p/2), sqrt(p/1.5), sqrt(p/1.25))
param_grid <- expand.grid(lambda = lambdas, mu = mus)

# run the grid search:

search_results <- vanilla_search(mat, pcp_func = root_pcp_na, grid_df = param_grid, cores = 4, perc_b = 0.2, runs = 20, verbose = TRUE, save_as = NULL)

# visualize the output:

print_gs2(search_results$summary_stats)

Columbia-PRIME/PCPhelpers documentation built on April 24, 2022, 7:57 p.m.

Columbia-PRIME/PCPhelpers index

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Columbia-PRIME/PCPhelpers
Provides a bunch of functions to help with PCP experimentation.

vanilla_search: Conducts a cross-validated grid search of the parameters for...
In Columbia-PRIME/PCPhelpers: Provides a bunch of functions to help with PCP experimentation.

Conducts a cross-validated grid search of the parameters for Principle Component Pursuit (PCP).

Description

Usage

Arguments

Value

Methods

See Also

Examples

Related to vanilla_search in Columbia-PRIME/PCPhelpers...

R Package Documentation

Browse R Packages

We want your feedback!

Columbia-PRIME/PCPhelpers Provides a bunch of functions to help with PCP experimentation.

vanilla_search: Conducts a cross-validated grid search of the parameters for... In Columbia-PRIME/PCPhelpers: Provides a bunch of functions to help with PCP experimentation.

Conducts a cross-validated grid search of the parameters for Principle Component Pursuit (PCP).

Description

Usage

Arguments

Value

Methods

See Also

Examples

Related to vanilla_search in Columbia-PRIME/PCPhelpers...

R Package Documentation

Browse R Packages

We want your feedback!

Columbia-PRIME/PCPhelpers
Provides a bunch of functions to help with PCP experimentation.

vanilla_search: Conducts a cross-validated grid search of the parameters for...
In Columbia-PRIME/PCPhelpers: Provides a bunch of functions to help with PCP experimentation.