generate_emulators_and_ensemble: Generate a set of emulators and combine into an ensemble

Description Usage Arguments Value Examples

View source: R/ensemble_generation.R

Description

This method generates all requested emulators then combines these into one ensemble. This takes as input a list of the emulation objects to create (could be random forest, support vector machine, neural network, general linear model, and gaussian process model), the simulation parameters and output response labels, an object created by the partitioned_dataset method (training, testing, and validation datasets), and an object created by method emulation_algorithm_settings. The latter sets key arguments used in emulation creation, as detailed in the description accompanying that method.

Usage

1
2
3
generate_emulators_and_ensemble(model_list, parameters, measures,
  partitioned_data, algorithm_settings = NULL, timepoint = NULL,
  normalised = FALSE, output_formats = c("pdf"))

Arguments

model_list

Vector of the types of emulation model to create. Accepted abbreviations are: SVM (Support-Vector Machine), GP (Gaussian Process Model), NNET (Neural Network), RF (Random Forest), GLM (General Linear Model)

parameters

Vector containing the names of the simulation parameters in the dataset on which the emulator is being trained

measures

Vector containing the simulation outputs that the emulators should be able to predict

partitioned_data

Object output from the function partition_dataset, an object containing training, testing, and validation data

algorithm_settings

Object output from the function emulation_algorithm_settings, containing the settings of the machine learning algorithms to use in emulation creation. If no setting changes are required, and a neural network is not being generated, this can be left out, and will be generated by generate_requested_emulations (so this defaults to NULL). If you are making any changes to the settings or generating a neural network, you must create this object before calling generate_requested_emulations.

timepoint

If using multiple timepoints, the timepoint for which emulators are being created

normalised

Whether the emulator data has been normalised or not. Affects how training and test output predictions are displayed

output_formats

File formats in which result graphs should be produced

Value

A list containing the ensemble, the time taken to generate it, and the sampling mins and maxes used in its creation such that unseen data used by and predictions generated by the ensemble can be scaled and rescaled correctly

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
sampleMaxes <- cbind(100,0.9,0.5,0.08,1,5)
sampleMins <-cbind(0,0.1,0.1,0.015,0.1,0.25)
modelList <- c("RF","GLM")
measures<-c("Velocity")
parameters<-c("stableBindProbability","chemokineExpressionThreshold",
"initialChemokineExpressionValue","maxChemokineExpressionValue",
"maxProbabilityOfAdhesion","adhesionFactorExpressionSlope")
data("sim_data_for_emulation")
partitionedData <- partition_dataset(sim_data_for_emulation[,1:7], parameters,
measures, percent_train=75, percent_test=15, percent_validation=10, normalise=TRUE,
sample_mins = sampleMins, sample_maxes = sampleMaxes)
generated_ensemble<-generate_emulators_and_ensemble(modelList, parameters,
measures, partitionedData, normalised=TRUE)

spartan documentation built on May 2, 2019, 9:39 a.m.