simBenchmark: Benchmark for simulated examples

View source: R/postProcess.R

simBenchmarkR Documentation

Benchmark for simulated examples

Description

This function checks the cluster allocation of profile regression against the generating clusters for a selection of the simulated dataset provided within the package. This can be used to compute confusion matrices for simulated examples, as shown in the example below.

Usage

simBenchmark(whichModel = "clusSummaryBernoulliDiscrete",
  nSweeps = 1000, nBurn = 1000, seedProfRegr = 123)

Arguments

whichModel

Which simulated dataset this benchmark should be carried out for. At the moment this function works only for these datasets structures provided in the package: "clusSummaryBernoulliNormal","clusSummaryBernoulliDiscreteSmall","clusSummaryCategoricalDiscrete", "clusSummaryNormalDiscrete","clusSummaryNormalNormal", "clusSummaryNormalNormalSpatial", "clusSummaryVarSelectBernoulliDiscrete", "clusSummaryBernoulliMixed". These dataset structures can be used by the function generateSampleDataFile to create simulated datasets.

nSweeps

The number of sweeps of the profile regression algorithm for this benchmarking.

nBurn

The number of sweeps in the burn in of the profile regression algorithm for this benchmarking.

seedProfRegr

Sets the seed for the random number generation in profile regression (ie. sets the seed for the portion of the MCMC code in C++). Note that setting this seed does not mean that the function simBenchmark will give the same answer. This is because the first step of this function generates a random sample, which will vary in each run unless a global seed is set in R using the function set.seed.

Value

This function creates a data.frame. Each row corresponds to each observation in the generated dataset. The columns are:

clusterAllocation

Cluster allocation carried out by profile regression. These values are integers, corresponding to cluster numbers.

outcome

Value of the outcome (y) in the dataset.

generatingCluster

Cluster allocation in the data generating mechanism. These values are characters which include the word 'Known' and then the original numbering of the cluster. The word 'Known' is included to avoid confusion with the cluster allocations identified by profile regression.

Authors

Silvia Liverani, Queen Mary University of London, UK

Austin Gratton, University of North Carolina Wilmington, USA

Maintainer: Silvia Liverani <liveranis@gmail.com>

References

Silvia Liverani, David I. Hastie, Lamiae Azizi, Michail Papathomas, Sylvia Richardson (2015). PReMiuM: An R Package for Profile Regression Mixture Models Using Dirichlet Processes. Journal of Statistical Software, 64(7), 1-30. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.18637/jss.v064.i07")}.

Examples

## Not run: 
# vector of all test datasets allowed by this benchmarking function
testDatasets<-c("clusSummaryBernoulliNormal",
  "clusSummaryBernoulliDiscreteSmall","clusSummaryCategoricalDiscrete",
  "clusSummaryNormalDiscrete","clusSummaryNormalNormal", 
  "clusSummaryNormalNormalSpatial","clusSummaryVarSelectBernoulliDiscrete", 
  "clusSummaryBernoulliMixed")

# runs profile regression on all datasets and 
# computes confusion matrix for each one
for (i in 1:length(testDatasets)){
  tester<-simBenchmark(testDatasets[i])
  print(table(tester[,c(1,3)]))
}



## End(Not run)

PReMiuM documentation built on Nov. 14, 2023, 1:08 a.m.