benchmark.data: Automatic benchmarking of data sets

View source: R/benchmark.data.R

benchmark.dataR Documentation

Automatic benchmarking of data sets

Description

This function provides an automated mechanism for producing the identifying the most likely cluster and largest test statistic for each simulated data set in one of the benchmark2003 or benchmark2006 data sets.

Usage

benchmark.data(
  TESTFUN,
  test.name,
  data.name,
  SAVE = FALSE,
  loop = FALSE,
  pfreq = 1,
  ...
)

Arguments

TESTFUN

A function that returns a list containing the maximum test statistic and the indices of the most likely cluster. The first argument MUST take a vector of cases.

test.name

The name of the test being applied. Must be a character vector.

data.name

A vector of names for the benchmark2003 or benchmark2006 data sets.

SAVE

A logical value indicating whether the results should be saved as an rda file. If TRUE, then the file is saved as paste("t", data.name, "_", test.name, ".rda", sep = "") to the current working directly. If FALSE, a list of results is returned. Default is FALSE.

loop

A logicial value indicating whether a loop should be used to run the benchmark instead of pbapply. The default is FALSE.

pfreq

The frequency that messages are reported from the loop. The default is pfreq = 1, meaning a message is returned for each index of the loop. This is chosen because it is assumed that this will only be used when the method is quite slow.

...

Additional arguments passed on to the TESTFUN and pbapply.

Details

For the specified data sets, TESTFUN is applied to each row of the specified data sets.

If the results are saved, they are saved in the current working directory with the name paste("t", data.name, "_", test.name, ".rda", sep = "").

Value

A list of results or writing out to an rda file.

Examples

# load required data
data(neastdata)
# construct zone information
coords = neastdata[, c("x", "y")]
ubpop = 0.5
pop = neastdata$population

# all distinct zones subject to population constraints
zones = smerc::scan.zones(coords, pop, ubpop)
# expected number of cases in each region
e = 600/sum(pop)*pop

# expected number of cases in each zone
ein = sapply(zones, function(x) sum(e[x]))
# expected number of cases outside of each zone
eout = 600 - ein

# takes a set of cases and determines the largest
# test statistic across all zones using required
# information
mlc.scan.test = function(cases, zones, ein, eout, ty) {
  # compute yin for each zone
  yin = sapply(zones, function(zone) sum(cases[zone]))
  # take max over statistics of all zones
  tobs = smerc::scan.stat(yin, ein, eout, ty)
  wmax = which.max(tobs)
  return(list(tmax = tobs[wmax],
              mlc = zones[[wmax]]))
}

out = benchmark.data(TESTFUN = mlc.scan.test,
                     test.name = "scan_test",
                     data.name = c("fakedata1", "fakedata2"),
                     SAVE = FALSE,
                     zones = zones,
                     ein = ein,
                     eout = eout,
                     ty = 600)

jpfrench81/neastbenchmark documentation built on July 26, 2023, 3:07 p.m.