critVal: Critical values

View source: R/critVal.R

critValR Documentation

Critical values

Description

Computes the vector of critical values or the global quantile. This function offers two ways of computation, either at significance level alpha from a Monte-Carlo simulation, see also section 3.2 in the vignette for more details, or from the global quantile / critical values given in the argument q. For more details on these two options see Section Computation of critical values / global quantile.
Since a Monte-Carlo simulation lasts potentially much longer (up to several hours or days if the number of observations is in the millions) than the main calculations, this package saves them by default in the workspace and on the file system such that a second call requiring the same Monte-Carlo simulation will be much faster. For more details, in particular to which arguments the Monte-Carlo simulations are specific, see Section Storing of Monte-Carlo simulations below. Progress of a Monte-Carlo simulation can be reported by the argument messages in ..., explained in monteCarloSimulation, and the saving can be controlled by the argument option.

Usage

critVal(n, q = NULL, alpha = NULL, nq = 2L^(as.integer(log2(n) + 1e-12) + 1L) - 1L,
        family = NULL, intervalSystem = NULL, lengths = NULL, penalty = NULL,
        weights = NULL, stat = NULL, r = 1e4, output = c("vector", "value"),
        options = NULL, ...)

Arguments

n

a positive integer giving the number of observations

q

either NULL, then the vector of critical values at level alpha will be computed from a Monte-Carlo simulation, or a numeric giving the global quantile or a numeric vector giving the vector of critical values. For more detailed information, in particular of which length the numeric vector should be, see Section Computation of critical values / global quantile. Either q or alpha must be given. Otherwise, alpha == 0.5 is chosen with a warning. Please note that by default the Monte-Carlo simulation will be saved in the workspace and on the file system, for more details see Section Simulating, saving and loading of Monte-Carlo simulations below

alpha

a probability, i.e. a single numeric between 0 and 1, giving the significance level. Its choice is a trade-off between data fit and parsimony of the estimator. In other words, this argument balances the risks of missing change-points and detecting additional artefacts. For more details on this choice see (Frick et al., 2014, section 4) and (Pein et al., 2017, section 3.4). Either q or alpha must be given. Otherwise, alpha == 0.5 is chosen with a warning

nq

a positive integer larger than or equal to n giving the (increased) number of observations for the Monte-Carlo simulation. See Section Simulating, saving and loading of Monte-Carlo simulations for more details

family

a string specifying the assumed parametric family, for more details see parametricFamily, currently "gauss", "hsmuce" and "mDependentPS" are supported. By default (NULL) "gauss" is assumed

intervalSystem

a string giving the used interval system, either "all" for all intervals, "dyaLen" for all intervals of dyadic length or "dyaPar" for the dyadic partition, for more details see intervalSystem. By default (NULL) the default interval system of the specified parametric family will be used, which one this will be is described in parametricFamily

lengths

an integer vector giving the set of lengths, i.e. only intervals of these lengths will be considered. Note that not all lengths are possible for all interval systems and for all parametric families, see intervalSystem and parametricFamily, respectively, to see which ones are allowed. By default (NULL) all lengths that are possible for the specified intervalSystem and for the specified parametric family will be used

penalty

a string specifying how different scales will be balanced, either "sqrt", "weights", "log" or "none", see penalty and section 3.2 in the vignette for more details. By default (NULL) the default penalty of the specified parametric family will be used, which one this will be is described in parametricFamily

weights

a numeric vector of length length(lengths) with only positive entries giving the weights that will be used for penalty "weights", see penalty and section 3.2.2 in the vignette for more details. By default (NULL) equal weights will be used, i.e.

weights == rep(1 / length(lengths), length(lengths))
stat

an object of class "MCSimulationVector" or "MCSimulationMaximum" giving a Monte-Carlo simulations, usually computed by monteCarloSimulation. If penalty == "weights" only "MCSimulationVector" is allowed. Has to be simulated for at least the given number of observations n and for the given family, intervalSystem and if "MCSimulationMaximum" for the given lengths and penalty. By default (NULL) the required simulation will be made available automatically accordingly to the given options. For more details see Section Simulating, saving and loading of Monte-Carlo simulations and section 3.4 in the vignette

r

a positive integer giving the required number of Monte-Carlo simulations if they will be simulated or loaded from the workspace or the file system

output

a string specifying the return value, if output == "vector" the vector of critical values will be computed and if output == "value" the global quantile will be computed. For penalty == "weights" the output must be "vector", since no global quantile can be determined for this penalty

options

a list specifying how Monte-Carlo simulations will be simulated, saved and loaded. For more details see Section Simulating, saving and loading of Monte-Carlo simulations and section 3.4 in the vignette

...

there are two groups of further arguments:

  • further parameters of the parametric family. Depending on the argument family some might be required, but others might be optional, please see parametricFamily for more details

  • further arguments (seed, rand.gen and messages) that will be passed to monteCarloSimulation. monteCarloSimulation will be called automatically and most of the arguments will be set accordingly to the arguments of critVal, no user interaction is required and possible for these parameters. In addition, seed, rand.gen and messages can be passed by the user

Value

If output == "vector" a numeric vector giving the vector of critical values, i.e. a vector of length length(lengths), giving for each length the corresponding critical value. If output == "value" a single numeric giving the global quantile. In both cases, additionally, an attribute "n" gives the number of observations for which the Monte-Carlo simulation was performed.

Computation of critical values / global quantile

This function offers two ways to compute the resulting value:

  • If q == NULL it will be computed at significance level alpha from a Monte-Carlo simulation. For penalties "sqrt", "log" and "none" the global quantile will be the (1-alpha)-quantile of the penalised multiscale statistic, see section 3.2.1 in the vignette. And if required the vector of critical values will be derived from it. For penalty "weights" the vector of critical values will be calculated accordingly to the given weights. The Monte-Carlo simulation can either be given in stat or will be attempted to load or will be simulated. How Monte-Carlo simulations are simulated, saved and loaded can be controlled by the argument option, for more details see the Section Simulating, saving and loading of Monte-Carlo simulations.

  • If q is given it will be derived from it. For the argument q either a single finite numeric giving the global quantile or a vector of finite numerics giving the vector of critical values (not allowed for output == "value") is possible:

    • A single numeric giving the global quantile. If output == "vector" the vector of critical values will be computed from it for the given lengths and penalty (penalty "weights" is not allowed). Note that the global quantile is specific to the arguments family, intervalSystem, lengths and penalty.

    • A vector of length length(lengths), giving for each length the corresponding critical value. This vector is identical to the vector of critical values.

    • A vector of length n giving for each length 1:n the corresponding critical value.

    • A vector of length equal to the number of all possible lengths for the given interval system and the given parametric family giving for each possible length the corresponding critical value.

    Additionally, an attribute "n" giving the number of observations for which q was computed is allowed. This attribute must be a single integer and equal to or larger than the argument n which means that q must have been computed for at least n observations. This allows additionally:

    • A vector of length attr(q, "n") giving for each length 1:attr(q, "n") the corresponding critical value.

    • A vector of length equal to the number of all possible lengths for the given interval system and the given parametric family if the number of observations is attr(q, "n") giving for each possible length the corresponding critical value.

    The attribute "n" will be kept or set to n if missing.

Simulating, saving and loading of Monte-Carlo simulations

Since a Monte-Carlo simulation lasts potentially much longer (up to several hours or days if the number of observations is in the millions) than the main calculations, this function offers multiple possibilities for saving and loading the simulations. The simulation, saving and loading can be controlled by the argument option. This argument has to be a list or NULL and the following named entries are allowed: "simulation", "save", "load", "envir" and "dirs". All missing entries will be set to their default option.
Objects of class "MCSimulationVector", containing simulations of the multiscale vector of statistics, and objects of class "MCSimulationMaximum", containing simulations of the penalised multiscale statistic (for penalties "sqrt", "log" and "none"), can be simulated, saved and loaded. Each Monte-Carlo simulation is specific to the number of observations, the parametric family and the interval system, for "MCSimulationMaximum" additionally to the set of lengths and the used penalty. Both types will lead to the same result, however, an object of class "MCSimulationVector" is more flexible, since critical values for all penalties and all set of lengths can be derived from it, but requires much more storage space and has slightly larger saving and loading times. Note that Monte-Carlo simulations can only be saved and loaded if they are generated with the default function for generating random observations, i.e. when rand.gen (in ...) is NULL. For a given simulation this is signalled by the attribute "save" which is TRUE if a simulation can be saved and FALSE otherwise.
Monte-Carlo simulations can also be performed for a (slightly) larger number of observations n_q given in the argument nq, which avoids extensive resimulations for only a little bit varying number of observations. The overestimation control is still satisfied but the detection power is (slightly) smaller. But note that the default lengths might change when the number of observations is increased and, hence, for type "vectorIncreased" still a different simulation might be required.
We refer to the different types as follow:

  • "vector": an object of class "MCSimulationMaximum", i.e. simulations of the penalized multiscale statistic, for n observations

  • "vectorIncreased": an object of class "MCSimulationMaximum", i.e. simulations of the penalized multiscale statistic, for nq observations

  • "matrix": an object of class "MCSimulationVector", i.e. simulations of the multiscale vector of statistics, for n observations

  • "matrixIncreased": an object of class "MCSimulationVector", i.e. simulations of the multiscale vector of statistics, for nq observations

The simulations can either be saved in the workspace in the variable critValStepRTab or persistently on the file system for which the package R.cache is used. Loading from the workspace is faster, but either the user has to store the workspace manually or in a new session simulations have to be performed again. Moreover, storing in and loading from variables and RDS files is supported. Finally, a pre-computed collection of simulations of type "matrixIncreased" for parametric families "gauss" and "hsmuce" can be accessed by installing the package stepRdata available from http://www.stochastik.math.uni-goettingen.de/stepRdata_1.0-0.tar.gz.

options$envir and options$dirs

For loading from / saving in the workspace the variable critValStepRTab in the environment options$envir will be looked for and if missing in case of saving also created there. Moreover, the variable(s) specified in options$save$variable (explained in the Subsection Saving: options$save) will be assigned to this environment. options$envir will be passed to the arguments pos and where in the functions assign, get, and exists, respectively. By default, a local enviroment in the package is used.
For loading from / saving on the file system loadCache(key = keyList, dirs = options$dirs) and saveCache(stat, key = attr(stat, "keyList"), dirs = options$dirs) are called, respectively. In other words, options$dirs has to be a character vector constituting the path to the cache subdirectory relative to the cache root directory as returned by getCacheRootPath(). If options$dirs == "" the path will be the cache root path. By default the subdirectory "stepR" is used, i.e. options$dirs == "stepR". Missing directories will be created.

Simulation: options$simulation

Whenever Monte-Carlo simulations have to be performed, i.e. when stat == NULL and the required Monte-Carlo simulation could not be loaded, the type specified in options$simulation will be simulated by monteCarloSimulation. In other words, options$simulation must be a single string of the following: "vector", "vectorIncreased", "matrix" or "matrixIncreased". By default (options$simulation == NULL), an object of class "MCSimulationVector" for nq observations will be simulated, i.e. options$simulation == "matrixIncreased". For this choice please recall the explanations regarding computation time and flexibility at the beginning of this section.

Loading: options$load

Loading of the simulations can be controlled by the entry options$load which itself has to be a list with possible entries: "RDSfile", "workspace", "package" and "fileSystem". Missing entries disable the loading from this option. Whenever a Monte-Carlo simulation is required, i.e. when the variable q is not given, it will be searched for at the following places in the given order until found:

  1. in the variable stat,

  2. in options$load$RDSfile as an RDS file, i.e. the simulation will be loaded by

    readRDS(options$load$RDSfile).

    In other words, options$load$RDSfile has to be a connection or the name of the file where the R object is read from,

  3. in the workspace or on the file system in the following order: "vector", "matrix", "vectorIncreased" and finally of "matrixIncreased". For penalty == "weights" it will only be looked for "matrix" and "matrixIncreased". For each options it will first be looked in the workspace and then on the file system. All searches can be disabled by not specifying the corresponding string in options$load$workspace and options$load$fileSystem. In other words, options$load$workspace and options$load$fileSystem have to be vectors of strings containing none, some or all of "vector", "matrix", "vectorIncreased" and "matrixIncreased",

  4. in the package stepRdata (if installed) and if options$load$package == TRUE. In other words, options$load$package must be a single logical or NULL,

  5. if all other options fail a Monte-Carlo simulation will be performed.

By default (if options$load is missing / NULL) no RDS file is specified and all other options are enabled, i.e.

options$load <- list(workspace = c("vector", "vectorIncreased",
                                   "matrix", "matrixIncreased"),
                     fileSystem = c("vector", "vectorIncreased",
                                    "matrix", "matrixIncreased"),
                     package = TRUE, RDSfile = NULL).

Saving: options$save

Saving of the simulations can be controlled by the entry options$save which itself has to be a list with possible entries: "workspace", "fileSystem", "RDSfile" and "variable". Missing entries disable the saving in this option.
All available simulations, no matter whether they are given by stat, loaded, simulated or in case of "vector" and "vectorIncreased" computed from "matrix" and "matrixIncreased", respectively, will be saved in all options for which the corresponding type is specified. Here we say a simulation is of type "vectorIncreased" or "matrixIncreased" if the simulation is not performed for n observations. More specifically, a simulation will be saved:

  1. in the workspace or on the file system if the corresponding string is contained in options$save$workspace and options$save$fileSystem, respectively. In other words, options$save$workspace and options$save$fileSystem have to be vectors of strings containing none, some or all of "vector", "matrix", "vectorIncreased" and "matrixIncreased",

  2. in an RDS file specified by options$save$RDSfile which has to be a vector of one or two connections or names of files where the R object is saved to. If options$save$RDSfile is of length two a simulation of type "vector" or "vectorIncreased" (only one can occur at one function call) will be saved in options$save$RDSfile[1] by

    saveRDS(stat, file = options$save$RDSfile[1])

    and "matrix" or "matrixIncreased" (only one can occur at one function call) will be saved in options$save$RDSfile[2]. If options$save$RDSfile is of length one both will be saved in options$save$RDSfile which means if both occur at the same call only "vector" or "vectorIncreased" will be saved. Each saving can be disabled by not specifying options$save$RDSfile or by passing an empty string to the corresponding entry of options$save$RDSfile.

  3. in a variable named by options$save$variable in the environment options$envir. Hence, options$save$variable has to be a vector of one or two containing variable names (character vectors). If options$save$variable is of length two a simulation of type "vector" or "vectorIncreased" (only one can occur at one function call) will be saved in options$save$variable[1] and "matrix" or "matrixIncreased" (only one can occur at one function call) will be saved in options$save$variable[2]. If options$save$variable is of length one both will be saved in options$save$variable which means if both occur at the same call only "vector" or "vectorIncreased" will be saved. Each saving can be disabled by not specifying options$save$variable or by passing "" to the corresponding entry of options$save$variable.

By default (if options$save is missing) "vector" and "vectorIncreased" will be saved in the workspace and "matrix" and "matrixIncreased" on the file system, i.e.

options$save <- list(workspace = c("vector", "vectorIncreased"),
                     fileSystem = c("matrix", "matrixIncreased"),
                     RDSfile = NULL, variable = NULL).

Simulations can be removed from the workspace by removing the variable critValStepRTab, i.e. by calling remove(critValStepRTab, envir = envir), with envir the used environment, and from the file system by deleting the corresponding subfolder, i.e. by calling

unlink(file.path(R.cache::getCacheRootPath(), dirs), recursive = TRUE),

with dirs the corresponding subdirectory.

References

Frick, K., Munk, A., Sieling, H. (2014) Multiscale change-point inference. With discussion and rejoinder by the authors. Journal of the Royal Statistical Society, Series B 76(3), 495–580.

Pein, F., Sieling, H., Munk, A. (2017) Heterogeneous change point inference. Journal of the Royal Statistical Society, Series B, 79(4), 1207–1227.

See Also

monteCarloSimulation, penalty, parametricFamily, intervalSystem, stepFit, computeBounds

Examples



# vector of critical values
qVector <- critVal(100L, alpha = 0.5)
# global quantile
qValue <- critVal(100L, alpha = 0.5, output = "value")

# vector can be computed from the global quantile
identical(critVal(100L, q = qValue), qVector)

# for a conservative significance level, stronger confidence statements
critVal(100L, alpha = 0.05)
critVal(100L, alpha = 0.05, output = "value")

# higher significance level for larger detection power, but less confidence
critVal(100L, alpha = 0.99)
critVal(100L, alpha = 0.99, output = "value")

# different parametric family, different intervalSystem, a subset of lengths,
# different penalty and given weights
q <- critVal(100L, alpha = 0.05, family = "hsmuce", intervalSystem = "dyaLen",
             lengths = c(2L, 4L, 16L, 32L), penalty = "weights",
             weights = c(0.4, 0.3, 0.2, 0.1))

# vector of critical values can be given by a vector of length n
vec <- 1:100
vec[c(2L, 4L, 16L, 32L)] <- q
attr(vec, "n") <- 128L
identical(critVal(100L, q = vec, family = "hsmuce", intervalSystem = "dyaLen",
                  lengths = c(2L, 4L, 16L, 32L)), q)

# with a given monte-Carlo simulation for nq = 128 observations
stat <- monteCarloSimulation(128)
critVal(n = 100L, alpha = 0.05, stat = stat)

# the above calls saved and (attempted to) load Monte-Carlo simulations and
# simulated them for nq = 128 observations
# in the following call no saving, no loading and simulation for n = 100
# observations is required, progress of the simulation will be reported
critVal(n = 100L, alpha = 0.05, messages = 1000L,
        options = list(simulation = "vector", load = list(), save = list()))

# only type "vector" will be saved and loaded in the workspace
critVal(n = 100L, alpha = 0.05, messages = 1000L,
        options = list(simulation = "vector", load = list(workspace = "vector"),
                       save = list(workspace = "vector")))

# simulation of type "matrix" will be saved in a RDS file
# saving of type "vector" is disabled by passing "",
# different seed is set and number of simulations is reduced to r = 1e3
# to allow faster computation at the price of a less precise result
file <- tempfile(pattern = "file", tmpdir = tempdir(), fileext = ".RDS")
critVal(n = 100L, alpha = 0.05, seed = 1, r = 1e3,
        options = list(simulation = "matrix", load = list(),
                       save = list(RDSfile = c("", file))))
identical(readRDS(file), monteCarloSimulation(100L, seed = 1, r = 1e3))




stepR documentation built on Nov. 14, 2023, 1:09 a.m.