ffexp: Full factorial experiment

ffexpR Documentation

Full factorial experiment

Description

A class for easily creating and evaluating full factorial experiments.

Usage

e1 <- ffexp$new(eval_func=, )

e1$run_all()

e1$plot_run_times()

e1$save_self()

Arguments

eval_func The function called to evaluate each design point.

... Factors and their levels to be evaluated at.

save_output Should the output be saved?

parallel If TRUE, function evaluations are done in parallel.

parallel_cores Number of cores to be used in parallel. If "detect", parallel::detectCores() is used to determine number. "detect-1" may be used so that the computer isn't running at full capacity, which can slow down other tasks.

Methods

$new() Initialize an experiment. The preprocessing is done, but no function evaluations are run.

$run_all() Run all factor combinations.

$run_one() Run a single factor combination.

$add_result_of_one() Used to add result of evaluation to data set, don't manually call.

$plot_run_times() Plot the run times. Especially useful when they have been run in parallel.

$save_self() Save ffexp R6 object.

$recover_parallel_temp_save() If you ran the experiment using parallel with parallel_temp_save=TRUE and it crashes partway through, call this to recover the runs that were completed. Runs that were stopped mid-execution are not recoverable.

Public fields

outrawdf

Raw data frame of output.

outcleandf

Clean output in data frame.

rungrid

matrix specifying which inputs will be run for each experiment.

nvars

Number of variables

allvars

All variables

varlist

Character vector of objects to pass to a parallel cluster.

arglist

List of values for each argument

number_runs

Total number of runs

completed_runs

Logical vector of whether each run has been completed.

eval_func

The function that is called for each experiment trial.

outlist

A list of the output from each run.

save_output

Logical of whether the output should be saved.

parallel

Logical whether experiment runs should be run in parallel. Allows for massive speedup.

parallel_cores

How many cores to use when running in parallel. Can be an integer, or 'detect' will detect how many cores are available, or 'detect-1' will do one less than that.

parallel_cluster

The parallel cluster being used.

folder_path

The path to the folder where output will be saved.

verbose

How much should be printed when running. 0 is none, 2 is average.

extract_output_to_df

A function to extract the raw output into a data frame. E.g., if the output is a list, but you want a single item to show up in the output data frame.

hashvalue

A value used to make sure inputs match when reloading.

Methods

Public methods


Method new()

Create an 'ffexp' object.

Usage
ffexp$new(
  ...,
  eval_func,
  save_output = FALSE,
  parallel = FALSE,
  parallel_cores = "detect",
  folder_path,
  varlist = NULL,
  verbose = 2,
  extract_output_to_df = NULL
)
Arguments
...

Input arguments for the experiment

eval_func

The function to be run. It must take named arguments matching the names of ...

save_output

Should output be saved to file?

parallel

Should a parallel cluster be used?

parallel_cores

When running in parallel, how many cores should be used. Not actually the number of cores used, actually the number of clusters created. Can be more than the computer has available, but will hurt performance. Can set to 'detect' to have it detect how many cores are available and use that, or 'detect-1' to use one fewer than there are.

folder_path

Where the data and files should be stored. If not given, a folder in the existing directory will be created.

varlist

Character vector of names of objects that need to be passed to the parallel environment.

verbose

How much should be printed when running. 0 is none, 2 is average.

extract_output_to_df

A function to extract the raw output into a data frame. E.g., if the output is a list, but you want a single item to show up in the output data frame.


Method run_all()

Run an experiment. The user can choose to run all rows, or just specified ones, if it should be run in parallel, and what files should be saved.

Usage
ffexp$run_all(
  to_run = NULL,
  random_n = NULL,
  redo = FALSE,
  run_order,
  save_output = self$save_output,
  parallel = self$parallel,
  parallel_cores = self$parallel_cores,
  parallel_temp_save = save_output,
  write_start_files = save_output,
  write_error_files = save_output,
  delete_parallel_temp_save_after = FALSE,
  varlist = self$varlist,
  verbose = self$verbose,
  outfile,
  warn_repeat = TRUE
)
Arguments
to_run

Which rows should be run? If NULL, then all that haven't been run yet.

random_n

Randomly selects n trials among those not yet completed and runs them.

redo

Should already completed rows be run again?

run_order

In what order should the rows by run? Options: random, in_order, and reverse.

save_output

Should the output be saved?

parallel

Should it be run in parallel?

parallel_cores

When running in parallel, how many cores should be used. Not actually the number of cores used, actually the number of clusters created. Can be more than the computer has available, but will hurt performance. Can set to 'detect' to have it detect how many cores are available and use that, or 'detect-1' to use one fewer than there are.

parallel_temp_save

Should temp files be written when running in parallel? Prevents losing results if it crashes partway through.

write_start_files

Should start files be written?

write_error_files

Should error files be written for rows that fail?

delete_parallel_temp_save_after

If using parallel temp save files, should they be deleted afterwards?

varlist

A character vector of names of variables to be passed the the parallel cluster.

verbose

How much should be printed when running. 0 is none, 2 is average.

outfile

Where should master output file be saved when running in parallel?

warn_repeat

Should warnings be given when repeating already completed rows?


Method run_for_time()

Run the experiment for a given time, not for a specified number of trials. Runs 'batch_size' trials between checking the time elapsed, only needs to be more than 1 when running in parallel. It will complete the current batch before stopping, it does not quit in the middle of the batch when reaching the time limit, so it will go over the time limit given.

Usage
ffexp$run_for_time(
  sec,
  batch_size,
  show_time_in_bar = FALSE,
  save_output = self$save_output,
  parallel = self$parallel,
  parallel_cores = self$parallel_cores,
  parallel_temp_save = save_output,
  write_start_files = save_output,
  write_error_files = save_output,
  delete_parallel_temp_save_after = FALSE,
  varlist = self$varlist,
  verbose = self$verbose,
  warn_repeat = TRUE
)
Arguments
sec

Number of seconds to run for

batch_size

Number of trials to run between checking the time elapsed.

show_time_in_bar

The progress bar can show either the number of runs completed or the time elapsed.

save_output

Should the output be saved?

parallel

Should it be run in parallel?

parallel_cores

When running in parallel, how many cores should be used. Not actually the number of cores used, actually the number of clusters created. Can be more than the computer has available, but will hurt performance. Can set to 'detect' to have it detect how many cores are available and use that, or 'detect-1' to use one fewer than there are.

parallel_temp_save

Should temp files be written when running in parallel? Prevents losing results if it crashes partway through.

write_start_files

Should start files be written?

write_error_files

Should error files be written for rows that fail?

delete_parallel_temp_save_after

If using parallel temp save files, should they be deleted afterwards?

varlist

A character vector of names of variables to be passed the the parallel cluster.

verbose

How much should be printed when running. 0 is none, 2 is average.

warn_repeat

Should warnings be given when repeating already completed rows?


Method run_superbatch()

Run batches. Allows for better progress visualization and saving when running in parallel

Usage
ffexp$run_superbatch(
  nsb,
  redo = FALSE,
  run_order,
  save_output = self$save_output,
  parallel = self$parallel,
  parallel_cores = self$parallel_cores,
  parallel_temp_save = save_output,
  write_start_files = save_output,
  write_error_files = save_output,
  delete_parallel_temp_save_after = FALSE,
  varlist = self$varlist,
  verbose = self$verbose,
  warn_repeat = TRUE
)
Arguments
nsb

Number of super batches

redo

Should already completed rows be run again?

run_order

In what order should the rows by run? Options: random, in_order, and reverse.

save_output

Should the output be saved?

parallel

Should it be run in parallel?

parallel_cores

When running in parallel, how many cores should be used. Not actually the number of cores used, actually the number of clusters created. Can be more than the computer has available, but will hurt performance. Can set to 'detect' to have it detect how many cores are available and use that, or 'detect-1' to use one fewer than there are.

parallel_temp_save

Should temp files be written when running in parallel? Prevents losing results if it crashes partway through.

write_start_files

Should start files be written?

write_error_files

Should error files be written for rows that fail?

delete_parallel_temp_save_after

If using parallel temp save files, should they be deleted afterwards?

varlist

A character vector of names of variables to be passed the the parallel cluster.

verbose

How much should be printed when running. 0 is none, 2 is average.

warn_repeat

Should warnings be given when repeating already completed rows?

outfile

Where should master output file be saved when running in parallel?


Method run_one()

Run a single row of the experiment. You can specify which one to run. Generally this should not be used by users, use 'run_all' instead.

Usage
ffexp$run_one(
  irow = NULL,
  save_output = self$save_output,
  write_start_files = save_output,
  write_error_files = save_output,
  warn_repeat = TRUE,
  is_parallel = FALSE,
  return_list_result_of_one = FALSE,
  verbose = self$verbose,
  force_this_as_output
)
Arguments
irow

Which row should be run?

save_output

Should the output be saved?

write_start_files

Should a file be written when starting the experiment?

write_error_files

Should a file be written if there is an error?

warn_repeat

Should a warning be given if repeating a row?

is_parallel

Is this being run in parallel?

return_list_result_of_one

Should the list of the result of this one be return?

verbose

How much should be printed when running. 0 is none, 2 is average.

force_this_as_output

Value to use instead of evaluating function.


Method add_result_of_one()

Add the result of a single experiment to the object. This shouldn't be used by users.

Usage
ffexp$add_result_of_one(
  output,
  systime,
  irow,
  row_grid,
  row_df,
  start_time,
  end_time,
  save_output,
  hashvalue
)
Arguments
output

The output of the experiment.

systime

The time it took to run

irow

The row of inputs used.

row_grid

The corresponding row in the run grid.

row_df

The corresponding row data frame.

start_time

The start time of the experiment.

end_time

The end time of the experiment.

save_output

Should the output be saved?

hashvalue

Not used.


Method plot_run_times()

Plot the run times of each trial.

Usage
ffexp$plot_run_times()

Method plot_pairs()

Plot pairs of inputs and outputs. Helps see correlations and distributions.

Usage
ffexp$plot_pairs()

Method plot()

Calling 'plot' on an 'ffexp' object calls 'plot_pairs()'

Usage
ffexp$plot()

Method calculate_effects()

Calculate the effects of each variable as if this was an experiment using a linear model.

Usage
ffexp$calculate_effects()

Method calculate_effects2()

Calculate the effects of each variable as if this was an experiment using a linear model.

Usage
ffexp$calculate_effects2()

Method save_self()

Save this R6 object

Usage
ffexp$save_self(verbose = self$verbose)
Arguments
verbose

How much should be printed when running. 0 is none, 2 is average.


Method create_save_folder_if_nonexistent()

Create the save folder if it doesn't already exist.

Usage
ffexp$create_save_folder_if_nonexistent()

Method rename_save_folder()

Rename the save folder

Usage
ffexp$rename_save_folder(new_folder_path, new_folder_name)
Arguments
new_folder_path

New path for the save folder

new_folder_name

If you want the new save folder to be in the current directory, you can use this instead of 'new_folder_path' and just give the folder name.


Method delete_save_folder_if_empty()

Delete the save folder if it is empty. Used to prevent leaving behind empty folders.

Usage
ffexp$delete_save_folder_if_empty(verbose = self$verbose)
Arguments
verbose

How much should be printed when running. 0 is none, 2 is average.


Method recover_parallel_temp_save()

Running this loads the information saved to files if 'save_parallel_temp_save=TRUE' was used when running. Useful when running long jobs in parallel so that you don't lose all results if it crashes before finishing.

Usage
ffexp$recover_parallel_temp_save(delete_after = FALSE, only_reload_new = FALSE)
Arguments
delete_after

Should the temp files be deleted after they are recovered? If TRUE, make sure you save the ffexp object after running this function so you don't lose the data.

only_reload_new

Will only reload output from runs that don't show as completed yet. Can make it much faster if there are many saved files, but most have already been loaded to this object.


Method rungrid2()

Display the input rows of the experiment. rungrid just gives integers, this gives the actual values.

Usage
ffexp$rungrid2(rows = 1:nrow(self$rungrid))
Arguments
rows

Which rows to display the inputs for? On big experiments, specifying the rows can be much faster.


Method add_variable()

Add a variable to the experiment. You must specify the value of the variable for all existing rows, and then also the values of the variable which haven't been run yet.

Usage
ffexp$add_variable(name, existing_value, new_values, suppressMessage = FALSE)
Arguments
name

Name of the variable being added.

existing_value

Which existing argument is a level being added to?

new_values

The values of the new variable which have not been run. This should not include 'arg_name', the name of the new variable at the existing values.

suppressMessage

Should the message be suppressed? The message tells the user a new variable was added and it is being returned in a new object. Default FALSE.


Method add_level()

Add a level to one of the arguments. This returns a new object. The existing object is not changed.

Usage
ffexp$add_level(arg_name, new_values, suppressMessage = FALSE)
Arguments
arg_name

Which existing argument is a level being added to?

new_values

The value of the new levels to be added to 'arg_name'.

suppressMessage

Should the message be suppressed? The message tells the user a new level was added and it is being returned in a new object. Default FALSE.


Method remove_results()

Remove results of completed trials. They will be rerun next time $run_all() is called.

Usage
ffexp$remove_results(to_remove)
Arguments
to_remove

Indexes of trials to remove


Method print()

Printing the object shows some summary information.

Usage
ffexp$print()

Method set_parallel_cores()

Set the number of parallel cores to be used when running in parallel. Needed in case user sets "detect"

Usage
ffexp$set_parallel_cores(parallel_cores)
Arguments
parallel_cores

When running in parallel, how many cores should be used. Not actually the number of cores used, actually the number of clusters created. Can be more than the computer has available, but will hurt performance. Can set to 'detect' to have it detect how many cores are available and use that, or 'detect-1' to use one fewer than there are.


Method stop_cluster()

Stop the parallel cluster.

Usage
ffexp$stop_cluster()

Method finalize()

Cleanup after deleting object.

Usage
ffexp$finalize()

Method clone()

The objects of this class are cloneable with this method.

Usage
ffexp$clone(deep = FALSE)
Arguments
deep

Whether to make a deep clone.

Examples

# Two factors, both with two levels.
#   The evaluation function simply prints out the combination
cc <- ffexp$new(a=1:2,b=c("A","B"),
                eval_func=function(...) {c(...)})
# View the factor settings it will run (each row).
cc$rungrid
# Evaluate all four settings
cc$run_all()


cc <- ffexp$new(a=1:3,b=2, cd=data.frame(c=3:4,d=5:6),
                eval_func=function(...) {list(...)})

CollinErickson/comparer documentation built on Feb. 25, 2023, 6:36 p.m.