benchmark.data.slow: Automatic, but slow benchmarking of a data set

View source: R/benchmark.data.slow.R

benchmark.data.slowR Documentation

Automatic, but slow benchmarking of a data set

Description

This function provides an automated mechanism for identifying the most likely cluster and largest test statistic for a simulated data set within the benchmark2003 or benchmark2006 data sets. This function uses a a loop and message to print progress instead of the pbapply function. The advantage is that incremental progress is easily seen, allowing the user to identify any problematic rows of the data set. The results for each row of the data set are saved in a file using the name paste("t", data.name, "_", test.name, "_", i, ".rds", sep = ""),, where i is the row of the data set.

Usage

benchmark.data.slow(
  TESTFUN,
  test.name,
  data.name,
  idx = seq_len(10000),
  ...,
  units = "auto"
)

clean.benchmark(
  test.name,
  data.name,
  idx = seq_len(99999),
  SAVE = FALSE,
  unlist = FALSE
)

Arguments

TESTFUN

A function that returns a list containing the maximum test statistic and the indices of the most likely cluster. The first argument MUST take a vector of cases.

test.name

The name of the test being applied. Must be a character vector.

data.name

The name for the benchmark2003 or benchmark2006 data set to benchmark. This can only be a single data set.

idx

A vector with the row indices of the data set to be benchmarked.

...

Additional arguments passed on to the TESTFUN.

units

The units of time for printing the iterative evaluation time. The default is "auto". See difftime for additional options.

SAVE

A logical value indicating whether the results should be saved as an rda file. If TRUE, then the file is saved as paste("t", data.name, "_", test.name, ".rda", sep = "") to the current working directly. If FALSE, a list of results is returned. Default is FALSE.

unlist

A logical indicating whether the unlist function should be applied to the collected results. The default is FALSE.

Details

For the specified data set, TESTFUN is applied to each row of the specified data sets.

Value

NULL. Results are saved in an rda file.

Examples

# load required data
data(neastdata)
# construct zone information
coords = neastdata[, c("x", "y")]
ubpop = 0.5
pop = neastdata$population

# all distinct zones subject to population constraints
zones = smerc::scan.zones(coords, pop, ubpop)
# expected number of cases in each region
e = 600/sum(pop)*pop

# expected number of cases in each zone
ein = sapply(zones, function(x) sum(e[x]))
# expected number of cases outside of each zone
eout = 600 - ein

# takes a set of cases and determines the largest
# test statistic across all zones using required
# information
mlc.scan.test = function(cases, zones, ein, eout, ty) {
  # compute yin for each zone
  yin = sapply(zones, function(zone) sum(cases[zone]))
  # take max over statistics of all zones
  tobs = smerc::scan.stat(yin, ein, eout, ty)
  wmax = which.max(tobs)
  return(list(tmax = tobs[wmax],
              mlc = zones[[wmax]]))
}

## Not run: 
benchmark.data.slow(TESTFUN = mlc.scan.test,
                    test.name = "scan_test",
                    data.name = "fakedata1",
                    idx = seq_len(10),
                    zones = zones,
                    ein = ein,
                    eout = eout,
                    ty = 600)
clean.benchmark(test.name = "scan_test",
                    data.name = "fakedata1",
                    idx = seq_len(10))

## End(Not run)

jpfrench81/neastbenchmark documentation built on July 26, 2023, 3:07 p.m.