getCritVal: Critical values

Description Usage Arguments Value Simulating, saving and loading of Monte-Carlo simulations References See Also Examples

View source: R/getCritVal.R

Description

Computes critical values for the functions jsmurf, jules, hilde, stepDetection and improveSmallScales. getCritVal is usually automatically called, but can be called explicitly, for instance outside of a for loop to save run time. Computation requires Monte-Carlo simulations. Since a Monte-Carlo simulation lasts potentially much longer (up to several hours or days if the number of observations is in the millions) than the main calculations, the simulations are by default saved in the workspace and on the file system such that a second call that require the same Monte-Carlo simulation will be much faster. For more details, in particular to which arguments the Monte-Carlo simulations are specific, see Section Simulating, saving and loading of Monte-Carlo simulations below. Progress of a Monte-Carlo simulation can be reported by the argument messages and the saving can be controlled by the argument option.

Usage

1
2
3
4
getCritVal(n, filter, family = c("jules", "jsmurf", "jsmurfPS", "jsmurfLR",
                                "hjsmurf", "hjsmurfSPS", "hjsmurfLR", "LR", "2Param"), 
           alpha = 0.05, r = NULL, nq = n, options = NULL,
           stat = NULL, messages = NULL, ...)

Arguments

n

a positive integer giving the number of observations

filter

an object of class lowpassFilter giving the used analogue lowpass filter

family

the parametric family for which critical values should be computed, select "jules" for a critical value that will be passed to jules or stepDetection, the families "jsmurf", "jsmurfPS", "jsmurfLR", "hjsmurf", "hjsmurfSPS" and "hjsmurfLR" according to the argument family in jsmurf and hilde, and "LR" and "2Param" according to the argument method in hilde and improveSmallScales

alpha

a probability, i.e. a single numeric between 0 and 1, giving the significance level. Its choice is a trade-off between data fit and parsimony of the estimator. In other words, this argument balances the risks of missing conductance changes and detecting additional artefacts. For more details on this choice see the accompanying vignette or (Frick et al., 2014, Section 4) and (Pein et al., 2017, Section 3.4)

r

a positive integer giving the required number of Monte-Carlo simulations if they will be simulated or loaded from the workspace or the file system, a larger number improves accuracy but simulations last longer; by default 1e4 is used except for families "LR" and "2Param", where 1e3 is used since their simulations are rather slow

nq

a positive integer larger than or equal to n giving the (increased) number of observations for the Monte-Carlo simulation. See Section Simulating, saving and loading of Monte-Carlo simulations for more details

options

a list specifying how Monte-Carlo simulations will be simulated, saved and loaded. For more details see Section Simulating, saving and loading of Monte-Carlo simulations

stat

an object of class "MCSimulationVector" or "MCSimulationMaximum", usually computed by monteCarloSimulation. Has to be simulated for at least the given number of observations n and for the given filter. If missing it will automatically be loaded and if not found simulated accordingly to the given options. For more details see Section Simulating, saving and loading of Monte-Carlo simulations

messages

a positive integer or NULL, in each messages iteration a message will be given in order to show the progress of the simulation, if NULL no message will be given

...

additional arguments of the parametric families "LR" and "2Param", i.e. thresholdLongSegment, localValue, localVar, regularization, suppressWarningNoDeconvolution, localList, please see their documentation in improveSmallScales to understand their meaning, parameters have to coincide in the call of getCritVal and hilde or improveSmallScales, argument localVar is only allowed for family "2Param", for other families additional arguments are ignored with a warning

Value

For families "jules", "jsmurf", "jsmurfPS", "jsmurfLR" a single numeric giving the critical value and for families "hjsmurf", "hjsmurfSPS", "hjsmurfLR", "LR" and "2Param" a numeric vector giving scale dependent critical values. Additionally, an attribute n with a single integer giving the number of data points for which the values are computed.

Simulating, saving and loading of Monte-Carlo simulations

Since a Monte-Carlo simulation lasts potentially much longer (up to several hours or days if the number of observations is in the millions) than the main calculations, this function offers multiple possibilities to save and load the simulations. By default, simulations are stored and loaded with suitable default values and no user choices are required. If desired, the simulation, saving and loading can be controlled by the argument option. This argument has to be a list or NULL. For the list the following named entries are allowed: "simulation", "save", "load", "envir" and "dirs". All missing entries will be set to their default option.
Each Monte-Carlo simulation is specific to the parametric family, their parameters in case of families "LR" or "2Param", the number of observations and the used filter. Monte-Carlo simulations can also be performed for a (slightly) larger number of observations nq given in the argument nq, which avoids extensive resimulations for only a little bit varying number of observations at price of a (slightly) smaller detection power. We recommend to not use a nq more than two times larger than the number of observations n.
Objects of the following types can be simulated, saved and loaded:

Computation of scale depend critical values, i.e. calucatlions for the families "hjsmurf", "hjsmurfSPS", "hjsmurfLR", "LR" and "2Param" require an object of class "MCSimulationVector". Otherwise, objects of class "MCSimulationVector" and objects of class "MCSimulationMaximum" lead to the same result (if the number of observations is the same), but an object of class "MCSimulationVector" requires much more storage space and has slightly larger saving and loading times. However, simulations of type "vectorIncreased", i.e. objects of class "MCSimulationMaximum" with nq observations, have to be resimulated if as.integer(log2(n1)) != as.integer(log2(n2)) when the saved simulation was computed with n == n1 and the simulation now is required for n == n2 and nq >= n1 and nq >= n2. All in all, if all data sets in the analysis have the same number of observations simulations of type "vector" for families "jules", "jsmurf", "jsmurfPS", "jsmurfLR" and "matrix" for families "hjsmurf", "hjsmurfSPS", "hjsmurfLR", "LR" and "2Param" are recommended. If they have a slightly different number of observations it is recommend to set nq to the largest number and to use simulations for an increased number of observations. For families "jules", "jsmurf", "jsmurfPS", "jsmurfLR" one should also consider the following: If as.integer(log2(n)) is the same for all data sets type "vectorIncreased" is recommend , if they differ type "matrixIncreased" avoids a resimulation at the price of a larger object to be stored and loaded.
The simulations can either be saved in the workspace in the variable critValStepRTab or persistently on the file system for which the package R.cache is used. Loading from the workspace is faster, but either the user has to save the workspace manually or in a new session simulations have to be performed again. Moreover, storing in and loading from variables and RDS files is supported.

options$envir and options$dirs

For loading from / saving in the workspace the variable critValStepRTab in the environment options$envir will be looked for and if missing in case of saving also created there. Moreover, the variable(s) specified in options$save$variable (explained in the Subsection Saving: options$save) will be assigned to this environment. By default the global environment .GlobalEnv is used, i.e. options$envir == .GlobalEnv.
For loading from / saving on the file system loadCache(key = keyList, dirs = options$dirs) and saveCache(stat, key = attr(stat, "keyList"), dirs = options$dirs) are called, respectively. In other words, options$dirs has to be a character vector constituting the path to the cache subdirectory relative to the cache root directory as returned by getCacheRootPath(). If options$dirs == "", the path will be the cache root path. By default the subdirectory "stepR" is used, i.e. options$dirs == "stepR". Missing directories will be created.

Simulation: options$simulation

Whenever Monte-Carlo simulations have to be performed, i.e. when stat == NULL and the required Monte-Carlo simulation could not be loaded, the type specified in options$simulation will be simulated by monteCarloSimulation. In other words, options$simulation must be a single string of the following: "vector", "vectorIncreased", "matrix" or "matrixIncreased". By default (options$simulation == NULL), an object of class "MCSimulationVector" for nq observations will be simulated, i.e. options$simulation == "matrixIncreased". For this choice please recall the explanations regarding computation time and flexibility at the beginning of this section.

Loading: options$load

Loading of the simulations can be controlled by the entry options$load which itself has to be a list with possible entries: "RDSfile", "workspace", "package" and "fileSystem". Missing entries disable the loading from this option. Whenever a Monte-Carlo simulation is required, i.e. when the variable q is not given, it will be searched for at the following places in the given order until found:

  1. in the variable stat,

  2. in options$load$RDSfile as an RDS file, i.e. the simulation will be loaded by

    readRDS(options$load$RDSfile).

    In other words, options$load$RDSfile has to be a connection or the name of the file where the R object is read from,

  3. in the workspace or on the file system in the following order: "vector", "matrix", "vectorIncreased" and finally of "matrixIncreased". For each option it will first be looked in the workspace and then on the file system. All searches can be disabled by not specifying the corresponding string in options$load$workspace and options$load$fileSystem. In other words, options$load$workspace and options$load$fileSystem have to be vectors of strings containing none, some or all of "vector", "matrix", "vectorIncreased" and "matrixIncreased",

  4. if all other options fail a Monte-Carlo simulation will be performed.

By default (if options$load is missing / NULL) no RDS file is specified and all other options are enabled, i.e.

1
2
3
4
5
options$load <- list(workspace = c("vector", "vectorIncreased",
                                   "matrix", "matrixIncreased"),
                     fileSystem = c("vector", "vectorIncreased",
                                    "matrix", "matrixIncreased"),
                     RDSfile = NULL).

Saving: options$save

Saving of the simulations can be controlled by the entry options$save which itself has to be a list with possible entries: "workspace", "fileSystem", "RDSfile" and "variable". Missing entries disable the saving in this option.
All available simulations, no matter whether they are given by stat, loaded, simulated or in case of "vector" and "vectorIncreased" computed from "matrix" and "matrixIncreased", respectively, will be saved in all options for which the corresponding type is specified. Here we say a simulation is of type "vectorIncreased" or "matrixIncreased" if the simulation is not performed for n observations. More specifically, a simulation will be saved:

  1. in the workspace or on the file system if the corresponding string is contained in options$save$workspace and options$save$fileSystem, respectively. In other words, options$save$workspace and options$save$fileSystem have to be vectors of strings containing none, some or all of "vector", "matrix", "vectorIncreased" and "matrixIncreased",

  2. in a variable named by options$save$variable in the environment options$envir. Hence, options$save$variable has to be a vector of one or two containing variable names (character vectors). If options$save$variable is of length two a simulation of type "vector" or "vectorIncreased" (only one can occur at one function call) will be saved in options$save$variable[1] and "matrix" or "matrixIncreased" (only one can occur at one function call) will be saved in options$save$variable[2]. If options$save$variable is of length one both will be saved in options$save$variable which means if both occur at the same call only "vector" or "vectorIncreased" will be saved. Each saving can be disabled by not specifying options$save$variable or by passing "" to the corresponding entry of options$save$variable.

By default (if options$save is missing) "vector" and "vectorIncreased" will be saved in the workspace and "matrixIncreased" on the file system, i.e.

1
2
3
options$save <- list(workspace = c("vector", "vectorIncreased"),
                     fileSystem = c("matrix", "matrixIncreased"),
                     RDSfile = NULL, variable = NULL).

Simulations can be removed from the workspace by removing the variable critValStepRTab, i.e. by calling remove(critValStepRTab, envir = envir), with envir the used environment, and from the file system by deleting the corresponding subfolder, i.e. by calling

1
unlink(file.path(R.cache::getCacheRootPath(), dirs), recursive = TRUE),

with dirs the corresponding subdirectory.

References

Pein, F., Bartsch, A., Steinem, C., Munk, A. (2020) Heterogeneous Idealization of Ion Channel Recordings - Open Channel Noise. arXiv:2008.02658.

Pein, F., Tecuapetla-Gómez, I., Schütte, O., Steinem, C., Munk, A. (2018) Fully-automatic multiresolution idealization for filtered ion channel recordings: flickering event detection. IEEE Transactions on NanoBioscience 17(3), 300–320.

Hotz, T., Schütte, O., Sieling, H., Polupanow, T., Diederichsen, U., Steinem, C., and Munk, A. (2013) Idealizing ion channel recordings by a jump segmentation multiresolution filter. IEEE Transactions on NanoBioscience 12(4), 376–386.

Frick, K., Munk, A., Sieling, H. (2014) Multiscale change-point inference. With discussion and rejoinder by the authors. Journal of the Royal Statistical Society, Series B 76(3), 495–580.

Pein, F., Sieling, H., Munk, A. (2017) Heterogeneous change point inference. Journal of the Royal Statistical Society, Series B, 79(4), 1207–1227.

See Also

jsmurf, jules, hilde, lowpassFilter, stepDetection, improveSmallScales

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
# the for the recording of the gramA data set used filter
filter <- lowpassFilter(type = "bessel", param = list(pole = 4L, cutoff = 1e3 / 1e4),
                        sr = 1e4)

# critical value for jules or stepDetection
# this call requires a Monte-Carlo simulation at the first time
# and therefore might last a few minutes,
# progress of the Monte-Carlo simulation is reported
q <- getCritVal(length(gramA), filter = filter, messages = 100)

# this second call should be much faster
# as the previous Monte-Carlo simulation will be loaded
getCritVal(length(gramA), filter = filter)

# critical value for jsmurf, 
# Monte-Carlo simulations are specific to the parametric family,
# hence a new Monte-Carlo simulation is required
getCritVal(length(gramA), family = "jsmurfPS", filter = filter, messages = 100)

# scale dependent critical value for jsmurf allowing for heterogeneous noise,
# return value is a vector
getCritVal(length(gramA), family = "hjsmurf", filter = filter, messages = 100)

# scale dependent critical value for "LR" as used by improveSmallScales and hilde,
# return value is a vector
getCritVal(length(gramA), family = "LR", filter = filter, messages = 100)

# families "LR" and "2Param" allows to specify additional parameters in ...
# Monte-Carlo simulations are also specific to those values
getCritVal(length(gramA), family = "LR", filter = filter, messages = 100,
           localValue = mean, thresholdLongSegment = 15L)  

# much larger significance level alpha for a larger detection power,
# but also with the risk of detecting additional artefacts
getCritVal(length(gramA), filter = filter, alpha = 0.9)

# medium significance level alpha for a tradeoff between detection power
# and the risk to detect additional artefacts
getCritVal(length(gramA), filter = filter, alpha = 0.5)

# critical values depend on the number of observations and on the filter
# also a new Monte-Carlo simulation is required
getCritVal(100, filter = filter, messages = 500)

otherFilter <- lowpassFilter(type = "bessel",
                             param = list(pole = 6L, cutoff = 0.2),
                             sr = 1e4)
getCritVal(100, filter = otherFilter, messages = 500)

# simulation for a larger number of oberservations can be used (nq = 100)
# does not require a new simulation as the simulation from above will be used
# (if the previous call was executed first)
getCritVal(90, filter = filter, nq = 100)

# simulation of type "vectorIncreased" for n1 observations can only be reused 
# for n2 observations if as.integer(log2(n1)) == as.integer(log2(n2))
# no simulation is required, since a simulation of type "matrixIncreased"
# will be loaded from the fileSystem
# this call also saved a simulation of type "vectorIncreased" in the workspace
getCritVal(30, filter = filter, nq = 100)
# here a new simulation is required
# (if no appropriate simulation is saved from a call outside of this file)
getCritVal(10, filter = filter, nq = 100, messages = 500,
           options = list(load = list(workspace = c("vector", "vectorIncreased")))) 

# the above calls saved and (attempted to) load Monte-Carlo simulations
# in the following call the simulations will neither be saved nor loaded
# to save some time the number of iterations is reduced to r = 1e3
# hence the critical value is computed with less precision
# In general, r = 1e3 is enough for a first impression
# for a detailed analysis r = 1e4 is suggested
getCritVal(100, filter = filter, messages = 100, r = 1e3,
           options = list(load = list(), save = list()))

# simulations will only be saved in and loaded from the workspace,
# but not on the file system
getCritVal(100, filter = filter, messages = 100, r = 1e3,
           options = list(load = list(workspace = c("vector", "vectorIncreased")),
                          save = list(workspace = c("vector", "vectorIncreased"))))

# explicit Monte-Carlo simulations, not recommended
stat <- stepR::monteCarloSimulation(n = 100, , family = "mDependentPS",
                                    filter = filter, output = "maximum",
                                    r = 1e3, messages = 100)
getCritVal(100, filter = filter, stat = stat)

clampSeg documentation built on Aug. 25, 2020, 5:07 p.m.