set_new_or_forced_rand_seed_if_necessary: Set a new or forced random seed if caller specifies that

set_new_or_forced_rand_seed_if_necessaryR Documentation

Set a new or forced random seed if caller specifies that

Description

The seed for the random number generator needs to be set at times to enable reproducibility of objects and runs. This function controls when that is done and what value is used to set the seed. It allows you to reset the seed at various checkpoints during a bdpg run. If you don't reset the seed at a given checkpoint, then it will continue using the sequence of values from the last time the seed was set until it reaches another checkpoint where you do choose to reset the seed, though you don't ever have to reset it if you don't want to.

Usage

set_new_or_forced_rand_seed_if_necessary(
  is_rsrun,
  is_rsprob,
  parameters,
  cor_or_app_str,
  basic_or_wrapped_or_comb_str,
  location_string
)

Arguments

is_rsrun

boolean indicating that the seed creation test is being done just before the creation of a reserve selection RUN object

is_rsprob

boolean indicating that the seed creation test is being done just before the creation of a reserve selection PROBLEM object

parameters

parameters list for the run, usually derived from project.yaml and can have a varying number and set of elements depending on the run

cor_or_app_str

string indicating correct or apparent (i.e., "COR" or "APP")

basic_or_wrapped_or_comb_str

character string containing "BASE" or "WRAP" or "COMB"

location_string

character string, usually used to indicate where this function was called, e.g., at the start of the creation of an rsproblem; however, string can be whatever you want, doesn't have to be about location

Value

Returns a 2 element list with element named "new_seed" containing the new integer seed value or NA and the element named "R_internal_seed_array" containing the array value stored in R's global variable ".Random.seed" at the end of this function (i.e., the internal state of R's random number generator)

Rules for seed creation and setting

seed names

There are nine different locations in the code where a seed can be set. These locations are specified through the use of variable names in the parameters list (usually derived from project.yaml). The nine associated names are:

  • bdpg_run_init_rand_seed

  • cor_base_rsprob_rand_seed

  • cor_base_rsrun_rand_seed

  • app_base_rsprob_rand_seed

  • app_base_rsrun_rand_seed

  • cor_wrap_rsprob_rand_seed

  • cor_wrap_rsrun_rand_seed

  • app_wrap_rsprob_rand_seed

  • app_wrap_rsrun_rand_seed

The first name corresponds to the seed for the bdpg run as a whole. The remaining names correspond to the seeds for correct and apparent versions of basic and wrapped problems and reserve selector runs over each of those problems.

forced seed

If a particular named seed is specified in the parameters list (e.g., in project.yaml), then the seed corresponding to that named seed will be set to the value provided with the name. In other words, providing a seed overrides all other rules given below. For example, "app_wrap_rsprob_rand_seed: 555" in project.yaml will force the seed for the creation of the apparent version of the wrapped problem to be set to 555.

bdpg initialization seed

A run of bdpg will always set a seed at the start of the run.

  • If the bdpg_run_init_rand_seed variable exists in the parameters list and has an integer value, then set.seed() will be called with that value at the start of the entire bdpg run.

  • If the variable exists but has a value that is not a legal argument for set.seed(), the program will probably crash.

  • If the bdpg_run_init_rand_seed variable is not in the list or has a NULL value, then a seed will be generated based on the current time, and then set.seed() will be called using that value.

set_rand_seed_at_creation_of_all_new_major_objects

All other seeds besides the bdpg initialization seed are controlled by a few other things.

  • As stated earlier, any named seed that has a value assigned to it in the parameters list will have the seed set to that value just before the creation of the associated object.

  • If the seed name does not appear in the parameters list or has a NULL value, then the seed will not be set at the start of the creation of that object UNLESS the parameters list contains a variable called set_rand_seed_at_creation_of_all_new_major_objects and it is set to TRUE.

  • If set_rand_seed_at_creation_of_all_new_major_objects exists and is set to TRUE and the named seed does not appear in the parameters list or has a NULL value, then the seed WILL be set at the start of the creation of that object. In that case, it will be set to a seed derived from the current time. This is useful when you want to be able to know the value of the seed when an object is created so that you can recreate it, but you don't care what that value is and so you haven't set it specifically yourself.

Recovering seed values for reproducibility

Every time a seed is set, its value is written to the console with a label indicating where it was set (e.g., at bdpg initialization).

  • At all times other than the bdpg initialization, the value is saved in the object whose creation it immediately precedes.

  • If the seed is NOT set at object creation, NA is saved in the object as the seed value.

So, if you want to regenerate a particular object or run, the easiest way to do it is to find the seed value(s) inside the console output and redo the entire bdpg run with the seed value(s) set in project.yaml to those found in the previous bdpg run's output. Note that this is only designed to be done with single runs. See Caveats below.

Caveats

While the intent of this routine is to give lots of flexibility in setting seeds for reproducibility, it still doesn't solve the whole problem and it does have a few things to be careful about when trying to reproduce results.

Recovering seed values for reproducibility

If you do a tzar run with multiple repetitions, this function will make the same seed assignments in every repetition. It might be possible to extend this routine in the future by allowing arrays of seed values, but at the moment, it's doing what's needed for most development cases.

Beware of cascading seed effects

If you set the named seed for an object, you may expect it to produce the same object as a different run that used the same seed. However, if the object interacts with another object generated earlier in the same run but that object's seed is not reset, you can get different results for the second object in the second run. For example:

  • Suppose that you generate a correct base object and then a wrapped object in the same run and you have set_rand_seed_at_creation_of_all_new_major_objects set to TRUE so that you can see the seed that is set for each object as it is created.

  • Suppose that you run bdpg and note that the seed generated for the bdpg initalization in the output was 123 and the seed set for the correct problem was 456 and the seed set for the wrap problem was 789.

  • Then, suppose that you want to test the re-creation of the wrap problem and redo the run with "cor_wrap_rsprob_rand_seed: 789" but none of the other seeds specified.

  • The problem is that the generation of the correct base problem that your wrap is depending on will be generated using a different seed than it was the first time and will therefore be a different problem. When the wrap goes to build around the base problem it will result in a different wrap problem even though it's using the same seed as on the previous run.

Examples

## Not run: 
parameters = list (cor_base_rsprob_rand_seed = 123)
set_new_or_forced_rand_seed_if_necessary (
               is_rsrun                      = FALSE,
               is_rsprob                    = TRUE,
               parameters,
               cor_or_app_str               = "COR",
               basic_or_wrapped_or_comb_str = "BASE",
               location_string              = "for basic CORRECT problem")

         
## End(Not run)

langfob/bdpg documentation built on Dec. 8, 2022, 5:33 a.m.