random_search_cv: Conducts a cross-validated randomized search of the...
In Columbia-PRIME/PCPhelpers: Provides a bunch of functions to help with PCP experimentation.

random_search_cv

R Documentation

Conducts a cross-validated randomized search of the parameters for Principle Component Pursuit (PCP).

Description

random_search_cv conducts a cross-validated randomized search of the parameters for a given PCP function, given a data matrix mat and parameter settings to search through. See the Methods section below for more details.

Usage

random_search_cv(
  mat,
  pcp_func,
  grid_df,
  n_evals,
  cores = NULL,
  perc_b = 0.2,
  runs = 100,
  seed = NULL,
  progress_bar = TRUE,
  file = NULL,
  ...
)

Arguments

`mat`	The data matrix to conduct the grid search on.
`pcp_func`	The PCP function to use when grid searching. Note: the PCP function passed must be able to handle missing `NA` values. For example: `root_pcp_na`.
`grid_df`	A dataframe with dimension N x P containing the N-many settings of P-many parameters to try. The columns of grid_df should be named exactly as they are in the function header of `pcp_func`. For example, if `pcp_func = root_pcp_noncvx_na`, then the columns of `grid_df` should be named "lambda", "mu", and "r" (assuming you want to search all 3 parameters; if one of those parameters is constant, instead of giving it its own column in `grid_df`, you can simply pass it as a free argument to this method. See `...` below). An optional additional column named "value" can be included that contains the mean relative errors recorded by that row's parameter setting, with those rows (settings) that have not been tried left as `NA`. In this way, you can perform a grid search in which you already know the relative errors of some parameter settings, but would like to expand your knowledge of the unexplored parts of the grid further. Ex: conduct a a bayesian grid search, examining 10/50 settings. Then search again, looking at another 10 settings, but including the information learned from the first run.
`n_evals`	The number of parameter settings in `grid_df` you would like to evaluate.
`cores`	The number of cores to use when parallelizing the grid search. If `cores = 1`, the search will be conducted sequentially. If `cores > 1`, then the search will be parallelized. By default, `cores =` the maximum available cores on your machine. For optimal performance, `cores` should usually be set to half that.
`perc_b`	The percent of entries of the matrix `mat` that will be randomly imputed as `NA` missing values. By default, `perc_b = 0.2`.
`runs`	The number of times to test a given parameter setting. By default, `runs = 100`.
`seed`	The seed used when randomly selecting parameter settings to evaluate. By default, `seed = NULL` to simulate randomness. For reproducible results, set `seed` to some whole number.
`progress_bar`	An optional logical indicating if you would like a progress bar displayed or not. By default, `progress_bar = TRUE`.
`file`	An optional character containing the file path used to save the output in. Should end in "`.Rda`". When `file = NULL`, the output is not saved. By default, `file = NULL`.
`...`	Any parameters required by `pcp_func` that were not specified in `grid_df`, and therefore are kept constant (not involved in the grid search). An example could be the `LOD` parameter for those PCP functions that require the `LOD` argument.

Value

A list containing the following:

raw: a data.frame containing the raw statistics of each run comprising the grid search. These statistics include the parameter settings for the run, the random seed used for the corruption step outlined in step 1 of the Methods section below, the relative error for the run, the rank of the recovered L matrix, the sparsity of the recovered S matrix, and the number of iterations PCP took to reach convergence (20,000 = Did not converge as of PCPhelpers v. 0.3.1).
formatted: A data.frame containing the summary of the grid search. Made to easily pass on to print_gs.
constants: A list containing those arguments initially passed as constant values when calling random_search_cv.

Methods

Each hyperparameter setting is cross-validated by:

Randomly corrupting perc_b percent of the entries in mat as missing (i.e. NA values), yielding corrupted_mat. Done via the corrupt_mat_randomly function.
Running a PCP function (pcp_func) on corrupted_mat, giving L_hat and S_hat.
Recording the relative recovery errors of L_hat + S_hat compared with the raw input data matrix for only those values that were imputed as missing during the corruption step.
Repeating steps 1-3 for a total of runs many times.
Reporting the mean of the runs-many runs for each parameter setting.

Examples


library(pcpr) # since we will be passing grid_search_cv a PCP function 

# simulate a data matrix:

n <- 50
p <- 10
data <- sim_data(sim_seed = 1, nrow = n, ncol = p, rank = 3, sigma=0, add_sparse = FALSE)
mat <- data$M

# pick parameter settings of lambda and mu to try:

lambdas <- c(1/sqrt(n), 1.25/sqrt(n), 1.5/sqrt(n))
mus <- c(sqrt(p/2), sqrt(p/1.5), sqrt(p/1.25))
param_grid <- expand.grid(lambda = lambdas, mu = mus)

# run the grid search:

param_grid.out <- random_search_cv(mat, pcp_func = root_pcp_na, grid_df = param_grid, n_evals = 4, cores = 4, perc_b = 0.2, runs = 20, seed = 1, progress_bar = TRUE, file = NULL)

# visualize the output:

print_gs(param_grid.out$formatted)

Columbia-PRIME/PCPhelpers documentation built on April 24, 2022, 7:57 p.m.

Columbia-PRIME/PCPhelpers index

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Columbia-PRIME/PCPhelpers
Provides a bunch of functions to help with PCP experimentation.

random_search_cv: Conducts a cross-validated randomized search of the...
In Columbia-PRIME/PCPhelpers: Provides a bunch of functions to help with PCP experimentation.

Conducts a cross-validated randomized search of the parameters for Principle Component Pursuit (PCP).

Description

Usage

Arguments

Value

Methods

See Also

Examples

Related to random_search_cv in Columbia-PRIME/PCPhelpers...

R Package Documentation

Browse R Packages

We want your feedback!

Columbia-PRIME/PCPhelpers Provides a bunch of functions to help with PCP experimentation.

random_search_cv: Conducts a cross-validated randomized search of the... In Columbia-PRIME/PCPhelpers: Provides a bunch of functions to help with PCP experimentation.

Conducts a cross-validated randomized search of the parameters for Principle Component Pursuit (PCP).

Description

Usage

Arguments

Value

Methods

See Also

Examples

Related to random_search_cv in Columbia-PRIME/PCPhelpers...

R Package Documentation

Browse R Packages

We want your feedback!

Columbia-PRIME/PCPhelpers
Provides a bunch of functions to help with PCP experimentation.

random_search_cv: Conducts a cross-validated randomized search of the...
In Columbia-PRIME/PCPhelpers: Provides a bunch of functions to help with PCP experimentation.