In carlopacioni/HexSimR: An R Package For Post HexSim Simulation Analysis

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

Introduction

This second tutorial introduces a few new functionalities that have been implemented in HexSimR >= 0.3.3. Most of these functionalities aim to facilitate scenario creation and manipulation. If you only have to make a few one-off changes in the scenario(s) that you have created, it is probably easier to just manually do those, but if you are planning to generate a large number of scenarios from a template scenario where you modify some parameters, or you realised that you set things up wrong and want to apply the same change to a (relatively) large number of scenarios, then you may find it useful to use HexSimR to speed up and automate this process. HexSim stores the scenario settings in a xml (eXtensible Markup Language) file, so, unless specified, the functions presented here actually modify these files and some basic understanding of the xml formatting is required. The good thing about xml is that it is quite straight forward and intuitive, so you'll see that getting a good handle of it won't require much effort.

Xml file structure

There are fantastic and free tutorials available on the internet nowadays that describe xml formatting in details. In this short section I aim to only point out the critical elements that are needed to correctly execute HemSimR's xml utility functions.

The easiest way to understand how HexSim codes the simulation settings is to open a scenario file and have a look at its structure. Using a source code editor like Notepad++ (that colour-codes different elements of the text based on the language used) can help to better visualise the structure.

The core element in a xml file is a node. A node is opened by wrapping its name with the symbols < and >. For example: \. A node is then closed with a forward slash in front of its name: \. Sometimes, it can be opened and closed in one go, like this: \. We call attribute elements of the node that define its parameters' values and are defined by some text followed by the "=" symbols. For example:

<MyNode myAttribute="a value"/>

Nodes can have a hierarchical structure, meaning that some are nested into others. These are sometimes called child nodes.

<MyNode>
  <myChildNode/>
</MyNode>

Lastly, nodes can have element values, which can be text or numbers and are contained between the node's opening and closing, like this:

<MyNode> "This is a text element" </MyNode>

Scenario settings and parameters in HexSim are set by a node in the xml file. Some nodes have a unique identifier, which can be an attribute (generally name="attribute_value"), or a text element value (e.g. \ node_name \). To this end, in this tutorial, identifiers are considered elements that make the node unique. For example, accumulate events all have the same structure, but they can be identified by the value given to the node \. From a practical point of view, the easiest way to identify which nodes are modified when some changes are applied to a scenario is to create a reference scenario, then manually modify the events/parameters and save it with a different name. If you then compare the two xml files using, for example, the plug-in "compare" in Notepad++, or the program Winmerge (on Windows), the lines that are different between the two files will be highlighted and it should be straightforward to identify the relevant nodes.

knitr::include_graphics("CompareXml2.tif")

We will now step through a few examples of these new functions. HexSimR >= 0.3.3 comes with a few example files. Let's first create a temporary directory called testFolder:

testFolder <- tempdir(check = TRUE)

We will be using this directory to run the examples. If you are replicating this tutorial on your machine, it may be a good idea to browse to this temporary directory to see what changes while we go through the examples. Just type testFolder to see the path where the temporary directory is.

testFolder

Batch modification of scenarios

If several scenarios have been developed in HexSim and the user later realises that there are modifications that need to be applied to several or all of them, this function can help automate this task. It involves generating a scenario with the correct/new events and parameters. This scenario is then used as a template and the nodes that need to be added, replaced or deleted are identified (mapped) using a .csv file. Some setting up is required and it may not be worthwhile to use this function for a handful of scenarios.

The easiest is to step through an example. With the following code, we load the package, locate the example files, and store them in testFolder. All this assumes that you have already installed HexSimR.

library(HexSimR, quietly = TRUE)
new <- system.file("extdata", "MRVC_4BaitYr_ThreeCells.xml", package="HexSimR")
old <- system.file("extdata", "MRVC_4BaitYr_ThreeCells_old.xml", package="HexSimR")
csv.in <- system.file("extdata", "test_csv.csv", package="HexSimR")
file.copy(c(new, old, csv.in), testFolder)

The .csv file essentially provides the information on how to locate the nodes that need to be modified, and what type of actions need to be carried out.The .csv must be located in the scenario folder (i.e. where your scenario files are), and the name should be passed with csv.in as a character vector. The file must have the following headings: nodes, identifier, attribute, mode, ref, ref_identifier, and ref_attribute. The function scenarios.batch.modifier can then be ued to carry out the intended modifications. scenarios.batch.modifier will parse the file and use the csv.in's columns as arguments. You can explore the "test_csv.csv" that you just copied to see an example:

csv <- read.csv(csv.in)
knitr::kable(csv)

In this instance, I had decided that I wanted to add two census events to count the individuals (dingoes in this specific example) that were in different sections of the landscape. This would require not only the creation of a census event (e.g. /scenario/event/censusEvent), but also the machinery needed behind the scenes: a trait to codify the individuals' positions (e.g. /scenario/population/traits/accumulatedTrait), an accumulator to count the individuals in each section of the landscape (e.g. /scenario/population/accumulators/accumulator), an event to update the accumulator (e.g. /scenario/event/accumulateEvent/) and the map to provide the spatial data to identify the various landscape sections (e.g. /scenario/spatialDataSeries). Because I was monitoring two different spatial approaches (what is indentified as ThreeCells and EightCells in the table), the whole process needed to be repeated twice. At the time I had >20 scenarios that I wanted to apply these modifications to, a very long process if I were to carry it out manually...

Let's step through the various elements of the table above so that you will be able to build your own if needed. The column with heading nodes is the path to the node to be searched in xml.template, which is the template file (that is, the file that contains the new nodes/parameter values), except when the mode is "delete", in which case the nodes are searched in the scenario file to be modified and are then deleted. The path starts at the root of the xml file(s) (scenarios) and progresses until the node's name or identifier is found. The path must start with a "/", and must not have a "/" at the end. For example, for an accumulator, the path in the column "nodes" would be: "/scenario/population/accumulators/accumulator/name".

identifier is used to indicate whether the node has an identifier. If it does (e.g. name="attribute_value"), the name (or the value if an attribute) of the identifier needs to be in this column (e.g. attribute_value), otherwise FALSE must be used.

attribute indicates whether the identifier is an attribute (TRUE) or not (FALSE).

mode indicates the type of action that needs to be performed. Possible options are "add", "before", "after", "replace" or "delete". If "add" the node is added as the last child of the parent node. If the node needs to be added in a specific position (events generally do), "before" or "after" should be used to indicate the position in relation to the ref node (i.e. whether the new node goes before or after the node ref. Note that the ref node is searched in the scenarios" file(s), not the xml.template (pay attention here: xml.template is the file that contains the new nodes/parameter values that are used as template for the correct simulation settings. The scenarios file(s) are the ones that are modified). If the option "delete" is used, the node is searched and deleted from the scenarios file. When "replace" is used, the ref node must be provided, even when the node is the same. This is because there might be situations where the name of the node is being changed. In these cases, scenarios.batch.modifier would not find the original node in the scenarios file.

ref needs to be passed when the options "before", "after" or "replace" are used in mode. When not relevant, NA is used. When ref is used, then a search is performed in the scenarios xml file and the fields ref_identifier and ref_attribute must also be passed when relevant. ref_identifier and ref_attribute have the same meaning as identifier and attribute, but they refer to the ref node. Use FALSE when these are not relevant.

An example of how to appply this function is:

XMLTest <- scenarios.batch.modifier(
    path.scenarios=testFolder,
    scenarios="MRVC_4BaitYr_ThreeCells_old.xml", 
    xml.template="MRVC_4BaitYr_ThreeCells.xml",
    csv.in="test_csv.csv")

When you excecute scenarios.batch.modifier, a back up of the folder path.scenarios is copied in the folder "Scenarios_bkup", one level up from path.scenarios. You can use the backed up file to verify that the changes have been applied and identify the differences. You will see that a bunch of accumulators and events and their relative spatial data have been added.

If scenarios="all" is used (default), all scenarios in path.scenarios are processed (wihth the exclusion of xml.template), otherwise it is possible to select a subset of scenarios using a character vector, e.g. scenarios=c("scen1", "scen2").

Additional details on this function can be found using ?scenarios.batch.modifier.

Once you are done exploring the output files, you can delete them and the folder with:

  files <- list.files(testFolder, full.names=TRUE)
  file.remove(files)
  unlink(file.path(dirname(testFolder), '/Scenarios_bkup'), 
           recursive=TRUE)

Latin Hypercube Sampling

The function LHS.scenarios is used to generate scenarios whose parameter combinations follow a Latin Hypercube Sampling design. In a nushell, what we want to achieve is to consider a range of values for a number of parameters (e.g. survival rates) and randomly sample those parameters within the established range to evaluate whether the results change. A Latin Hypercube Sampling (LHS) design is a sort of stratified random design that ensures that all the parameter combinations are considered. If enough samples are taken in a (generic) random sampling, there is probably no need to use a LHS, but it can help when fewer samples are taken. In my opinion, because using LHS can only improve inference compared to (generic) random sampling, or make no difference, I don't see a reason why it shouldn't be used.

Manually setting scenarios with different parameter values, becomes quickly annoying. For example, in the simple case where you want to test two parameters, each with 5 categorical values, you'll have to set up 25 scenarios... using LHS.scenarios, you can automate this process.

Parameter values can be drawn from normal, lognormal, binomial, beta or uniform distributions or have a set of fixed values. You have to have a template scenario that constitutes the backbone of your simulation framework (identified with the argument xml.template).

A .csv file needs to be created, must be located in the scenario folder and the name should be passed with csv.in as a character vector (see system.file("extdata", "test_csv_LHS.csv", package="HexSimR") for an example). The .csv must have the following headings: nodes, identifier, attribute, param_node, param_node_identifier, param_node_attribute, param_identifier, param_name, type, value, distribution. The meaning of nodes, identifier, and attribute is the same as in scenarios.batch.modifier (see above or ?scenarios.batch.modifier). When generate=FALSE (see below) only the last four arguments are mandatory.

An example of such a file could be:

csv.LHS.in <- system.file("extdata", "test_csv_LHS.csv", package="HexSimR")
csv <- read.csv(csv.LHS.in)
knitr::kable(csv)

One thing you may have noticed is that we are changing the values of a spatialDataSeries. In other words, we are changing the spatial data (e.g. a map) of the simulations. It is important to undestand that we are only changing the text string in the scenario file that indicates to HexSim where to find the spatial data. If the new value used doesn't exist (in our examples, if there is no HexMap "Shooting_5" and "Shooting_5_10") in the "project/Spatial Data/Hexagons" directory of the HexSim's project, HexSim won't be able to locate the spatial data and will stop, throwing an error. The other thing that it is important to consider is that there might be events that depend on that spatial data, and there might be events' parameters that might change depending on the spatial data loaded for that scenario with spatialDataSeries, so these have to match. We get back to this later on (see "Conditional replacement of xml elements") to find out how we can quickly cater for this situation. Last hint before moving on: if your xml modification also require the creations of lots of maps, the function make.map can help to automate this step too (if some conditions are met, see below).

There might be situations where the parameter values to be changed are in an internal node respective to the node identifier. In order to uniquely identify this parameter, the identifier of the parameter's node needs to be indicated. This is best explained with an example. An accumulateTrait is identified by the name attribute (i.e. \), however the parameter values are contained in the \ node within the accumulateTrait. The node \ is itself identified by a name attribute, but the parameters are stored under a threshold attribute.

<accumulatedTrait name="BaitRates" accumulator="BaitRate">
    <value name="NoBaits" threshold="-INF"/>
    <value name="Bait3km" threshold="33"/>
    <value name="Bait10km" threshold="66"/>
    <description/>
  </accumulatedTrait>

A search for the node accumulateTrait with the attribute name="BaitRates" would return three child nodes. To avoid multiple hits, nodes, identifier and attribute are used to uniquely identify the parent node where the parameter is contained (i.e. the node accumulateTrait with the attribute name="BaitRates").

param_node, param_node_identifier, and param_node_attribute (in this case, for example: /value/name and "Bait3km", respectively) are used to identify the node where the parameter values are contained (if not necessary, use NA for param_node and FALSE for the other columns), and param_identifier (here: threshold) is used to identify the actual parameter values that need to be changed, when this is stored as an attribute. Use FALSE when the latter is not relevant (e.g. if the node identifier is the parameter that needs to be changed).

type refers to the type of parameter. It can take one of the following: "integer", "numeric" or "character". value refers to the parameters of the distribution from which values are drawn (separated by a comma) if one is used: mean and sd for normal, meanlog and sdlog for lognormal, shape1 and shape2 for beta, prob for binomial, and min and max for uniform, otherwise a collection of values if distribution --> fixed. When distribution --> fixed or type --> character the elements in value have equal probability to be selected.

param_name is the name of the parameter that is being changed. These are used as labels (headers) in the hypercube matrix that is saved to disk and returned as an object from the function.

An example of how to use this function follows:

# Locate the template file
template <- system.file("extdata", "MRVC_4BaitYr_ThreeCells.xml", package="HexSimR")

# Locate the .csv file
csv.LHS.in <- system.file("extdata", "test_csv_LHS.csv", package="HexSimR")
# Create a temp dir
testFolder <- tempdir(check = TRUE)
# copy example files
file.copy(c(template, csv.LHS.in), testFolder)

# Execute the function
  LHS <- LHS.scenarios(
    path.scenarios=testFolder,
    xml.template="MRVC_4BaitYr_ThreeCells.xml",
    samples=9, 
    csv.in ="test_csv_LHS.csv",
    generate=TRUE)

The function returns a list and saves a few files depending on the user's choice. If generate is FALSE, then the function quits after generating the hypercube matrix (that is, a matrix where the parameter values combinations are saved).

When generate=TRUE, the second element of the list returned contains the nodes found in the template. It is probably a good idea to scan through these to check whether these were the expected ones and most importantly that none are empty! If the latter, something went wrong in identifying the nodes!

Conditional replacement of xml elements

There might be situtations where we want to apply a change to a scenario in relation to the values used in that scenario in other parameters. As mentioned above, an example is when some nodes refer to spatial data that may have changed due to our modification of the xml files (with either LHS.scenarios or scenarios.batch.modifier). xml.cond.replacement exists for this purpose. This function replaces values in specific element nodes of xml scenario files if a condition is satisfied (conditional replacement).

Similarly as for LHS.scenarios a .csv file identifies element nodes whose value needs to be satisfied. That is, a search is performed to identify a node and its values, which is then compared to the value reported in the column value in the .csv file. If found, then the scenario changes are carried out.

csv.LHS.condChangesin <- 
    system.file("extdata", "LHS_condChanges.csv", package="HexSimR")
csv <- read.csv(csv.LHS.condChangesin)
knitr::kable(csv)

For each row in the .csv file, a second .csv file needs to be present, which is passed with the argument lookup. This is basically a list of nodes that need to be modified and their new values. An example of such a file is:

csv.LHS.condChanges_lookup01in <- 
    system.file("extdata", "Test_LHS_condChangesLookup_01.csv", package="HexSimR")
csv <- read.csv(csv.LHS.condChanges_lookup01in)
knitr::kable(csv)

In this case, HexSimR searches for the spatialDataSeries node named "Shooting_5", checks whether the parameter value is "Shooting_5" and in these scenario files where these conditions are met, it will replace the original values with the (fake) values "ThisIsATest", in all the nodes reported in the lookup file.

Let's assume we want to modify the scenarios created with LHS.scenarios function run before. We could do this with the (made up) example files in HexSimR:

# locate the .csv file
csv.LHS.condChangesin <- 
    system.file("extdata", "LHS_condChanges.csv", package="HexSimR")

# Locate the second .csv for the 'lookup' argument
csv.LHS.condChanges_lookup01in <- 
    system.file("extdata", "Test_LHS_condChangesLookup_01.csv", package="HexSimR")

# Copy them in the temp folder
file.copy(c(csv.LHS.condChangesin, csv.LHS.condChanges_lookup01in), testFolder)

# Locate the LHS scenario files
xml_files <-list.files(testFolder, pattern="LHS[0-9].xml$", full.names=TRUE)

# execute the changes  
xml.cond.replacement(path.scenarios = testFolder, 
                       scenarios = basename(xml_files), 
                       csv.in = "LHS_condChanges.csv", 
                       lookup = "Test_LHS_condChangesLookup_01.csv")

Map and Barrier manipulation

The functions presented above can be very useful to speed up the creation/modification of a large number of scenarios. However, they do not deal with the spatial data, which are an important component of HexSim models. To (partially) automate the map creation/manipulation, since HexSimR v 0.4.4, there are a couple of functions that help with these tasks.

make.map

This function searches for a value in a map and replaces it with new.values. This function works with .csv files and assumes that the user has a template map to use, so some intermediate steps are still required. To put this in context, consider the following example. Assume you have a fenced area (cell) and you want to test different sizes of your cell (say, five possible sizes) and you also want to simulate eight different carrying capacity values (K), which are usually set in HexSim indirectly with the resource map. If you want to test all the K values for each of the cell sizes (that is, 40 possible combinations), one way you could quickly set this up is by creating a template file for each cell size with one K value and then generate one map for each additional K value from this template using make.map. Once you have generated your template files, you can save them as .csv from HexSim (selecting Display Spatial Data --> Display HexMap from HexSim menu and then File --> Save as --> CSV file). In the example files distributed with HexSimR there is a zipped file with a few examples of such files. In each of these files, the hexagon with value=40 was used to set the resource value of the hexagons inside the cell, while the surrounding were set to 48. In the example below we will modify the resource values for the cell's hexagons according to a sequence of values starting with 16 and increasing in incfrements of 8. Since each of these modifications will be saved in a different file, we will also use the argument sufs to append a suffix to the file name (which is instead passsed with file.name) so that it is clearer what each file is. Note that if length(new.values) > 1, you must use the sufs argument, otherwise all the files will have the same name and will be overwritten over and over again.

Once the new .csv files have been created, these can be imported in the model using HexSim machinery. The function w.csvmap.batch writes a .csv map batch file that can be run with the command prompt (where HexMapConverter.exe is, which is distributed with HexSim, see HexSim's manual).

# Locate and unzip example files
example_files <- system.file("extdata", "Spatial_templates.zip", package="HexSimR")
unzip(example_files, exdir = testFolder)
testSpatial <- file.path(testFolder, "Spatial_templates")

# Retrieve the list of file names (all the ones that start with "Sqkm")
lf <- list.files(testSpatial,   pattern = "^Sqkm", full.names = TRUE)
lf

# Create a sequence of k values
k <- seq(from= 16, to = 72, by = 8)
# Remove k=40 because it is the template
k <- k[-4]
k

# A for loop to work through each map template file and apply the changes
for(f in lf) {
  make.map(template=f, new.value=k, old.value=40, 
           # Rather than typing in each file name, I remove the numeric digits 
           # and '.csv. at the end of the template file name using a regular expression
           # 'basename(f) removes the path and leaves only the actual file name
           file.name=sub("*.[0-9].csv$", "", basename(f)), 
           # Because in the scenarios I was working on, k/4 express the density in 100 sqkm,
           # I use k/4 as suffix so that the names of the files have a more intuitive meaning
           sufs=paste0(k/4, ".csv"), dir.out=testSpatial)
}

# update the list of files, you should have 40 files starting with 'Sqkm' now
lf2 <- list.files(testSpatial,   pattern = "^Sqkm", full.names = TRUE)

# Write the bat file to generate HexMap
w.csvmap.batch(lf2, rows = 50, cols = 50, dir.out = testSpatial)

make.barrier

A similar approach can be used if you'd like to modify barrier parameters. In the example below I use 20 different values to modify the deflection of a fence, across five barrier files that contain data for five fences of different sizes. In this case, to present a different way you can do the same thing in R, I use mapply rather than a for loop. The difference is that the mapply will call each element of the arguments after the function name (make.barrier) and before the argument MoreArgs one at the time so that I can basically match the template file name with file.name. See ?make.barrier for more information.

# Locate example files
lf4 <- list.files(path = testSpatial, pattern = "^Fence", full.names = TRUE)

# Set mortality and deflection values
m <- rep(0, 2 * 10)
def <- rep(seq(0.998, 0.98, by=-0.002), each=2)
def

# I prefer using the fence permeability in % as suffix for file names
suffx <- round((1 - seq(0.998, 0.98, by=-0.002)) * 100, digits = 1)
suffx

mapply(make.barrier, lf4, file.name=sub("0.1.hbf$", "", basename(lf4)), 
       MoreArgs = list(mortality=m, deflection=def, npairs=1, sufs=suffx, 
                                                    dir.out=testSpatial))