Introduction

HexSimR is an R package that is designed to help process post-simulation data generated with HexSim. I developed this package for my own needs and it closely reflects what I needed to do. Because of this, it is not a package that provides an exhaustive number of tools, but hopefully it may provide some assistance with common tasks.

My general approach with population modelling has been, so far, to develop a baseline scenario/model. Then I change something in 'alternative scenarios' (these changes may be representative of management actions or environmental changes). Once I am happy with these scenarios, I run lots of replicates for each of them and then evaluate how these changes have affected the population trajectories and/or other population parameters (e.g. genetic distance). Depending on what I am interested in, I may 'evaluate' these changes by conducting pairwise comparisons. This means that I need some tools to collate results, calculate descriptive statistics across replicates, and conduct statistical tests for the pairs of scenarios of my interest. If you are working on a project with a similar approach, HexSimR may be useful to you too.

NOTE that an important consequence of my approach is that each scenario will have a consistent structure. For example, if in my baseline scenario there are three requests for a census, the first stratified by age, the second by sex and the third by a custom trait, in all alternative scenarios there will be three census requests identical to the baseline and in the same order. By doing so, HexSimR can average across data in the first census file, and then compare them across scenarios. If the scenarios do not follow a symmetric structure, you are actually comparing apples with oranges. In other words, there is no way for HexSimR to know what's in each file, it just locates the files based on HexSim output file naming conventions. You basically say to HexSimR "Compare the first census file across scenarios", with the assumption behind being that they have the same layout and type of data. If these files don't contain the same information, you will get the wrong answer (if any at all). It is up to you to make sure that you built the scenarios appropriately. This is critical, if you didn't understand this point, go back and re-read the last few sentences until they make sense to you! If you have a different set up between scenarios, or you ran only one replicate, HexSimR may be still valuable to process the data b ut not all functions may be meaningful in your context.

Lastly, because I am an extremely lazy person, if I find myself typing the same things, or copy-paste the same things, more than twice, I normally get annoyed and find a way to avoid that too. Hence, HexSimR also has a few functions to help prepare input or batch files, and to back up results without the need to type or click the same things over and over again. For the same reasons, very often I assume that files' and folders' names were not changed from the defaults (as well as the sub-directory structure) so that I don't have to re-type them every time. I tried to be flexible and often files' and folders' names are also passed as arguments, so if you happened to have changed them, you can still pass their names using the functions' arguments, but keep in mind that

  1. I may not have been 100% consistent with this (which means that if you changed file/folder names and the functions don't work for you anymore, don't complain with me, but you are welcome to let me know and I'll see if I can make the function more flexible)
  2. If you follow my approach you will find yourself typing very little, which I think is a very positive thing!

I will demonstrate below some of HexSimR functionalities by repeating some of the steps I made while working on the paper "Spatially-explicit model for assessing wild dog control strategies in Western Australia". The main purpose of this quick run down is to help you to decide whether HexSimR is something that can be useful for you without having to dedicate much time to figure out what each function does. If you have already decided that you want to use it, it may still make sense for you to quickly scan through this document so that you familiarise yourself with what HexSimR can do and it should give you a head start when you pass on your own data. Since HexSimR v 1.0 a second tutorial describes the new functionalities that are dedicated to modify and generate scenarios, including setting up a Latin Hypercube Sampling scheme.

Just to make sure no one gets confused, pay attention to the fact that when I use "HexSim" (no "R"), I am referring to the fantastic piece of software developed by Nathan Schumaker (Schumaker N, HexSim (Version 3 & 4). US Environmental Protection Agency. Environmental Research Laboratory, Corvallis, Oregon. https://www.hexsim.net/). When I use "HexSimR", I am referring to this R package.

Example data

The data that I use as an example are a reduced set of the simulations used for the paper (just two scenarios, and two replicates for each scenario) and are zipped. This is to keep the amount of data that you have to download when you install HexSimR to a minimum. To give you an idea of what the paper is about so that you can follow the example, here is a very brief summary:

We simulated wild dog population dynamics to forecast the effect of the fence upgrades and plausible control scenarios on wild dog populations on both sides of the State Barrier Fence in Western Australia. If you are wondering what the State Barrier Fence is, in a few words, it is a fence that cuts the south west of Western Australia with the intent to protect the most productive agriculture areas (Figure 1). From now on, I use 'inside' when I refer to the region west of the fence. Simulation parameters were drawn from published research on the biology and ecology of dingo populations in the WA northern rangelands. We assumed that non-pure dingoes (i.e. domestic dogs and hybrids) would also comply with the same parameters. In the example here, I only kept a scenario where no control is performed, and a second where control is applied inside and outside the fence at the highest regime we believe is possible in the region by means of baiting, trapping and shooting. Both scenarios simulate a fully dog proof fence (i.e. no transpassing is allowed). If you don't have access to the paper, but are curious to have a look at it, feel free to email me and I'll send you a copy.

knitr::include_graphics("Map.tif")

To find out where the (zipped) example data are, we run the following code, which loads the package and store the location of the example data file in example_file, assuming you have already installed HexSimR (if you haven't, instructions on how to do so are here.

library(HexSimR, quietly = TRUE)
example_file <- system.file("extdata", "Results.zip", package="HexSimR")

Then, we create a temporary directory where we can unzip the files (and sub-directories) so that, at the end of this tutorial, you can delete all the files we have been playing with, leaving your HexSimR installation alone:

duplicate <- tempdir(check = TRUE)
unzip(example_file, exdir = duplicate)

If you are replicating this tutorial on your machine, it may be a good idea to browse to this temporary directory to see what happens while we go through the examples. Just type duplicate to see the path where the temp directory is.

If you decide to use HexSimR, remember that for each function there is a help file, which you can access with the command ?function.name where function.name is the name of the function you want information for. You should consult these if you want more specific information for any given function. If this doesn't work for you, simply use a double question mark "??" and the function help files should open in a web browser.

Core analyses and data plotting

Calculations and plots with census and report files

Imagine this: you have developed your baseline scenario, have finalised your additional ~40 alternative scenarios in order to test several combination of management options (as we did in our paper), and, finally, you have proudly run 200 replicates for each of them. Each scenario generates, say, three census files that contain data you want look at (in our example there are actually four census files). Now you find yourself with 3 census files X 200 replicates X ~40 scenarios = ~24,000 post-simulation files to process (and that is just dealing with the census files, leaving alone all the rest). If you want to manually open and paste them together, or import one by one into a statistical package to analyse them, be my guest... Alternatively, you can get HexSimR to do this for you. In our model, I created a map that divided the study area into two populations: inside and outside the fence. By doing this I could ask HexSim to generate a census file where each individual would obtain a value for a specific trait ("PopID") based on its location. 'Trait Index 2' and 'Trait Index 3' are the number of loners and pack members inside the fence, and 'Trait Index 4' and 'Trait Index 5' are the number of loners and pack members outside the fence. All I have to do now, for each census file, for each scenario, is to sum the value of these two variables to obtain the total population size inside and outside the fence, and then average these across the 200 replicates to know what the average trend is for each scenario. This is what the three lines of codes below do. You only have to provide the path to the "Results" folder (if you have run HexSim, you may have noticed that it stores all the results in a folder called "Results"), indicate which census event you are interested in (in this case it is the number '2'), the name of the traits you want to consider with the argument headers, what you want to call your outcome variables ('surprisingly', I'm calling them 'Inside' and 'Outside'), and finally what sort of operation you want to perform on these variables ("+" should be self-explanatory... but just to be explicit, the first line of code translates into: "Trait Index 2" + "Trait Index 3"). Note that the outcome variables are effectively added as new columns in the census files. With scenario='all', I indicate that I want to do this for all scenarios that HexSimR finds in the 'Results' folder (but I could have provided a character vector with the names of a subset of scenarios).

The second block of code below calculates the sex ratios inside and outside the fence. I have included a census event stratified by age and sex in the model. I won't explain what each trait is (I'm sure you trust me on this), but the basic idea is that, with the first two function calls, I sum together all males inside and outside, respectively. Then I sum all females inside and outside the fence, and lastly I calculate the ratios between these sums. Once all these calculations are done, with the function collate.census, HexSimR calculates the mean and SD across all 200 iterations. Note, that collate.census does this for all census files that it finds, so I don't have to specify the census files I want to include. I'll obtain the average for each of them in one go regardless of whether I have performed any calculation on them. From HexSimR version 0.3.4 collate.census gained two additional arguments: start and end. These are used to set the intervals of time steps to be considered. end is particularly important because, if the population goes extinct, the results may be spurious if not set properly. This is because, if the population goes extinct, HexSim may stop recording data after the time step in which the population went extinct, hence the last time step may be different across iterations. Different time steps between replicates will affect the calculation of the mean as some time steps may be missing (while they should be zeros) if the population went extinct earlier (see the example in the section 'Probability of extinction' below). end is typically the last time step simulated (30 in this case), but it can be less if the user wants to discard the final tail of time steps in the simulations. end will also accept max as an option, but its use is discouraged (it only exists for programmatic reasons as explained in the function documentation). An additional argument gained by this function since v 1.0 is keep.zeros. If TRUE, collate.census will include the zeros in the mean and sd calculations. On the contrary, zeros will be stripped out before the calculations if FALSE. This is equivalent to considering only 'extant' populations (or traits) that are still existing (i.e. >0), or all (i.e. include zeros in the calculation of the means).

# HexSimR is expecting to be pointed to the Results folder
path.results <- paste(duplicate, "Results", sep="/")

# total pop size inside and outside
temp <- census.calc(path.results, ncensus=2, 
                    headers=c("Trait Index  2", "Trait Index  3"), 
                    var.name = "Inside", bin.f = "+", scenarios = "all")
temp <- census.calc(path.results, ncensus=2, 
                    headers=c("Trait Index  4", "Trait Index  5"), 
                    var.name = "Outside", bin.f = "+", scenarios = "all")

# Sum by gender
temp<-census.calc(path.results, ncensus=1, 
            headers=c("Trait Index  6", "Trait Index  7", 
                      "Trait Index 18", "Trait Index 19", 
                      "Trait Index 30", "Trait Index 31"), 
            var.name = "Males_In", bin.f = "+", scenarios = "all")

temp<-census.calc(path.results, ncensus=1, 
            headers=c("Trait Index 10", "Trait Index 11", 
                      "Trait Index 22", "Trait Index 23", 
                      "Trait Index 34", "Trait Index 35"), 
            var.name = "Males_Out", bin.f = "+", scenarios = "all")

temp<-census.calc(path.results, ncensus=1, 
            headers=c("Trait Index  4", "Trait Index  5", 
                      "Trait Index 16", "Trait Index 17", 
                      "Trait Index 28", "Trait Index 29"), 
            var.name = "Females_In", bin.f = "+", scenarios = "all")

temp<-census.calc(path.results, ncensus=1, 
            headers=c("Trait Index  8", "Trait Index  9", 
                      "Trait Index 20", "Trait Index 21", 
                      "Trait Index 32", "Trait Index 33"), 
            var.name = "Females_Out", bin.f = "+", scenarios = "all")

# Sex-ratios
temp<-census.calc(path.results, ncensus=1, 
            headers=c("Males_In", "Females_In"), 
            var.name = "SexRatio_In", bin.f = "/", scenarios = "all")

temp<-census.calc(path.results, ncensus=1, 
            headers=c("Males_Out", "Females_Out"), 
            var.name = "SexRatio_Out", bin.f = "/", scenarios = "all")

# Calculate the means and SDs
coll.census <- collate.census(path.results, scenarios="all", end=30)

Finally, one thing you may want to do is to plot these means with SD bars to inspect the results. The following will create a list with only one element, the plot. Remember to use the argument keep.zeros to mirror what you used in collate.census. You can omit this argument if you used the default settings.

# Plot census
census.p <- census.plot(path.results, scenarios="all", traits=c("Inside", "Outside"), 
                        ncensus=2, ngroups=1, keep.zeros = TRUE)
census.p[[1]]

You may want to replace the plot titles to something more understandable and remove the legend because, for example, you are planning to have this information in the figure caption. If you are preparing this plot for a publication, you may have to remove the background colour and the grid. The output of census.plot is just a ggplot object, so you can make these changes with normal ggplot commands:

library(ggplot2, quietly=TRUE)
# Create a vector to subset the scenario names
base <- census.p[[1]]$data$Scenario == "DingoBaseSBF_SelDist"

# Replace the scenario name "DingoBaseSBF_SelDist" for "Baseline"
census.p[[1]]$data$Scenario[base] <- "Baseline"

# Replace the second scenario
census.p[[1]]$data$Scenario[!base] <- "Baiting & Shooting"

# This sets the order of the scenarios and ensures that the Baseline plot is 
# on the left hand side 
census.p[[1]]$data$Scenario <- factor(census.p[[1]]$data$Scenario, 
                                 levels=c("Baseline", "Baiting & Shooting"))
new.plot <- census.p[[1]] + 
            theme_classic() + # set a white background with no grid
            ylab("Mean population size") + # Replace the y-axis title 
            theme(legend.position="none") # Remove legend
new.plot

Clearly, when baiting and shooting is implemented the population size is lower than the baseline scenario.

You may also want to calculate the descriptive statistics of HexSim generated reports. At the moment HexSimR can process the movement and ranges reports. For example, below HexSimR calculates the descriptive statistics for the ranges reports (after these were generated. See 'HexSimR utilities' if you want to have a look on how to quickly generate those).

# After reports are generated...
m.range.rep <- multi.reports(path.results, scenarios="all", pop.name="Dingoes", 
                               type="ranges", all=TRUE, hx=1122.4, 
                               events=c("Lonersexplore", "Adjustterritories2"), 
                               start="min", end="max")

You may like to explore the results of multi.reports. These are saved to disk for each scenario and are returned as a list to R. The list has an element for each processed scenario. Each of these elements is also a list with three elements: the first is the mean for each year.

# The follwoing prints the mean group size and resources calculated for each  
# year of the first scenario (but only the first and last 5 lines of the data 
# are printed)
m.range.rep[[1]][[1]][, 1:4, with=FALSE]

The second and the third are, respectively, the mean and the standard deviation across years between start and end for each events. The events are the events you inserted in HexSim sequence. If you leave the default NULL, then all events are considered (see the help file of the function ranges for more details on the output).

m.range.rep[[1]][[2]]
m.range.rep[[1]][[3]]

Now that you have collated together these data and have made up your mind about what could possibly be going on, you may want to statistically test whether the difference you have seen is significant. HexSimR offers you the possibility to compare scenarios with the Strictly Standardised Mean Difference (SSMD, Zhang 2007). The main reason why SSMD is a good choice here is because is a statistic that it is not inflated by large sample size, which typically are very large when running population dynamic model (in our example, we only have 2 but in the paper remember I ran 200 replicates and I would have normally ran 1,000 if it wasn't for computation limits!).

The SSMD is calculated as:

$$SSMD_{i}=\frac{V_i-V_B}{\sqrt{s^{2}_i+s^{2}_B}}$$

where $V_{i}$ is the mean value of the variable of interest in the $i$ scenario being compared, $V_{B}$ is the mean value in the baseline scenario, and $s_i$ and $s_B$ are their respective standard deviations.

In the example below, I run a comparison for two census files and the descriptive statistics of the report (remember to change the value of the argument keep.zeros if you changed it from the default when you ran collate.census). The names of the functions should be self explanatory to indicate which one is which, if you get lost just call the help file with the name of the function. I only use a very simple loop here, but if you have something more complex and want to loop SSMD.census to carry out multiple comparisons across several scenarios, have a look at the function apply.ssmd in the supplementary material of the paper for an example on how to do that.

for (i in 1:2) {
  ssmd.cen <- SSMD.census(path.results, base="DingoBaseSBF_SelDist", ncensus=i)
}

ssmd.ran <- SSMD.ranges(path.results, scenarios="all", base="DingoBaseSBF_SelDist", 
                       sum.ranges="summary_ranges.xlsx")

Each SSMD function returns a list with two elements, the first reports the SSMD value, and the second the p values. I only print out here the test for the ranges report because it is shorter:

ssmd.ran

Probability of extinction

HexSimR also offers the possibility to calculate the probability of extinction. This estimate doesn't make much sense in my example because no population goes extinct, so I'll demonstrate here how Pext can be used with some made up data. Pext calculates the probability of extinction for each time step, the cumulative probability (which basically answers the question: what is the probability that a population goes extinct in at least one time step) and (since HexSimR v 1.0) the mean probability of extinction across all time steps. Any of these statistics can be requested within a specific interval. It will also conduct a statistical comparison (only of the cumulative probability) if the number of the requested scenarios is > 1 and base is not NULL for the cumulative probability of extinction and the mean probability of extinction across all time steps.

Pext calculates the probability of extinction starting from the census outputs and you can select any trait that is included in the census event you indicated with the argument headers. This makes this function quite powerful as it can be used to calculate the probability that any trait(s) you included in a census file is (are) equal to zero (or non-zero by calculating 1 - Pext).

Please, read the function help file for more details on this function. One important note that it is worthwhile to point out is that in this function the formula to calculate SSMD becomes:

$$SSMD_{i}=\frac{V_B-V_i}{\sqrt{s^{2}{i}+s^{2}{B}}}$$

Note that, the direction of the changes is inverted. This is done to provide a more intuitive result, under the assumption that, more often than not, if the probability of extinction increases compared to the baseline scenario, it is a negative result.

You may have already noted that SSMD can't be computed if both $s$ are zero. This is likely to happen when the populations always or never go extinct as it is in this example. In this case, the cells in the returned xls file are blank. On the other extreme, I'm sure you have already noted that if the mean parameters are the same, then SSMD=0 and p-value=0.5.

In the example below, I calculate the probability of extinction for the overall population (that is, inside and outside the fence, which corresponds to the column named "Population.Size"), and inside the fence, across all simulated years (to this end, because some traits may take no value at Time Step=0, if you use the default for start, then Time Step will be set to 1). I first have to copy and decompress the fake results that are shipped with HexSimR and collate these with collate.census, then I calculate the probability of extinction and lastly print out the results for the cumulative probability of extinction:

# Extract made up results
test.file <- system.file("extdata", "PextTest.zip", package="HexSimR")
file.copy(test.file, duplicate)
unzip(test.file, exdir=duplicate)
fake.simulations <- file.path(duplicate, "PextTest", "PextTest")
collall<-collate.census(fake.simulations, end=30)

# Calculate the probability of extinction
peall <- Pext(collall, path.results=fake.simulations, ncensus=0, 
                headers=c("Population.Size", "Inside"), base="Sqkm5_LHS9")

peall$cumul.Pext.means

You can see that in these fake simulations the population inside the fence goes extinct 100% of the time (for at least one time step), but the overall population never does. I found that this statistic doesn't tell us much because it is clearly different if the population goes extinct for only one time step but then new dogs colonise the inside of the fence as opposed to the possibility of the population being extinct at multiple time steps. In this sense, the mean probability of extinction across time steps is possibly more informative because its value will reflect whether the population is more likely to go extinct over multiple years, in multiple replicates.

  peall$means.time.step.Pext

In the example above, it is quite straight forward to see that there is not much difference between the two scenarios in the probability that, in any given year, the population goes extinct inside the fence, but the probability of extinction of the overall population is almost half of that inside the fence in the scenario 'Sqkm5'.

Given the many time steps in which the population has size zero, this fake dataset is also suited to demonstrate the difference in using the option keep.zeros=FALSE as opposed to the default keep.zeros=TRUE, when collating the results. Below, I collate the census files with keep.zeros=FALSE and print out the mean values of the overall population for the last five time steps and compare those with the ones calculated previously.

  collext<-collate.census(fake.simulations, end=30, keep.zeros=FALSE)

  tail(collext$means$Sqkm5_LHS9$Sqkm5_LHS9.0$Population.Size)
  tail(collall$means$Sqkm5_LHS9$Sqkm5_LHS9.0$Population.Size)    

Note how the values are higher for 'collext', this is because using keep.zeros=FALSE we only retained the extant populations in the calculations, while when keep.zeros=TRUE the average calculations will include as many zeros as extinction events. Which one to use will then depend on the purpose of the analysis. If the user wants to compute the mean number of individuals when a population exists, then they will have to use only the extant populations (possibly in conjunction with Pext to find out how often the population doesn't exist).

Before abandoning this dataset, I will also use it to show how the number of rows is different when the default option end="max" is used. This is because one line is completely dropped as the population goes extinct in the lat year of the simulations in all replicates. Hence my remark before of not using the default option (and why Pext will throw an error if detects different number of rows in the collated data)!

  collmax<-suppressWarnings(collate.census(fake.simulations)) 
  nrow(collall$means$Sqkm5_LHS9$Sqkm5_LHS9.0)
  nrow(collmax$means$Sqkm5_LHS9$Sqkm5_LHS9.0)

Invasion front

In the paper, we wanted to monitor the progress of the 'invasion' of wild dogs from east to west in the agricultural area in the south west of Western Australia, and I have developed a function to help with this. An acknowledgment is due to Nathan Schumaker, who suggested to use an array of hexagons rather than pixels from an ASCII file as I had initially thought. The advantage of Nathan's approach is that this function is now quite flexible as you can arrange the array in any possible way that it is meaningful to your case. Because there is some setting up you have to do before you run your model, I'll provide here some indications on how to do that.

Firstly, in HexSim, you have to create an array of hexagons. As mentioned before, I have created a linear array.

knitr::include_graphics("Array.tif")

Another option (also suggested by Nathan) would have been to have radial patches. The bottom line is that it has to be something that does what you need it to do. Then, you have to give a value to each patch. I have used patches of four hexagons, and given them values from 1 to 11, east to west (Figure 2). Then you have to create a trait (which I called 'invasion_front') where each trait value represents a location. Potentially, using the trait builder named "Sequenced Trait" would make it easy to construct an accumulated trait with many trait values (Figure 3).

knitr::include_graphics("SequencedTrait.tif")

You have to make sure to insert an accumulate event using the 'individual location' updater in your model that targets the accumulator relevant for your trait (in my example 'invasion_front'), which will update the location of the animals.

Lastly, you have to insert a census event that uses the trait you just set up, so that you will have a census file that will report the number of animals you have in each patch (which remember are named 'Trait' in the census file). Once you have done all this and have run the simulations, you can then call invasion.front that will save an xls file with the mean & SD for each time step, as well as an overall mean. As usual, with ncensus you tell HexSimR which census file it should consider. The argument value is used to set a minimum threshold of animals. In my case, I set this to 1, which basically means that it is enough to have one animal in a patch to consider it occupied. You may want to use a different value depending on your needs. The patch.width is used to set the unit. In my case, each hexagon is 3.6 km in width, and each patch is 4 hexagons. So the width is 3.6 * 4 = 14.4 km. I could have used patch.width=4, in which case the distance traveled would have been expressed in number of hexagons.

inv <- invasion.front(path.results, ncensus=3, value = 1, patch.width=3.6*4,
               scenarios = "all")

Once done with this, you may want to plot these data to visually compare the mean distance traveled in each scenario (here I change only the names of the scenarios):

inv.plot <- invasion.plot(paste(path.results, "Invasion.front.xlsx", sep="/"))
inv.plot$data$Scenario <- c("Baseline", "Baiting & Shooting")
inv.plot$data$Scenario <- factor(inv.plot$data$Scenario, levels=c("Baseline", "Baiting & Shooting"))
inv.plot

Genetic analysis

Let's say that you have set up all your scenarios in HexSim, have run the hundreds of simulations you wanted to, and now you want to calculate the mean genetic distance between two populations. HexSimR offers the possibility to do this, but it calculates the genetic distance between the two populations for each replicate first, and then averages that across all replicates. This means that in order to use HexSimR's functions you need a genepop input file for each replicate (using HexSim). See the section "Prepare batch files" to see how to generate these. I have included the genepop files in the example data (as created by HexSim). We will use multi.clean.genepop() first to make sure that the files respect the formatting required by mmod (Winter, 2012), which is used internally by HexSimR when you use gen.dist(). Here is an example. Note that we are calculating in one go the genetic distance for all scenarios with the function multi.gen.dist(), which is a wrapper for gen.dist() to loop it over several scenarios. multi.gen.dist() will process all genepop files it will find. That is, if we had requested the genepop files for several time steps, we will be calculating the genetic distance for all of them. The assumption here is that if you created all these files, you wanted to do something with them. All the results of these functions will be stored in a "temp" R object. Note that downstream analyses will read the data directly from the files saved to disk, which is why I'm overwriting them in these example: I won't be using them later. Do not worry if you get a warning message about duplicate individual names, it can be safely ignored.

temp <- multi.clean.genepop(path.results, scenarios="all", pop.name="Dingoes")

# Get the mean and standard deviation of the genetic distance between populations
# for each population
temp <- multi.gen.dist(path.results, scenarios="all", pop.name="Dingoes")

# Finally calculate the mean
temp <- m.gen.dist(path.results, scenarios = "all", pop.name="Dingoes", traits="PopID")

Once we are done with all this, we can then plot the genetic distance to visualise what happened during the simulations:

# Plot the mean and SD for the time step 30
gp <- gen.plot(path.results = path.results, pop.name = "Dingoes", time.step = 30,
         traits = "PopID")
gp$data$Scenario <- c("Baseline", "Baiting & Shooting")
gp$data$Scenario <- factor(gp$data$Scenario, levels=c("Baseline", "Baiting & Shooting"))
gp

Minor utilities

HexSimR also has a few little functions to speed up the process that really do not do much other than avoiding repeated clicking. These are listed below.

Prepare batch files

If you ran several replicates of one scenario, you may want to combine their log files. HexSim provides utility to do this, but if you use the graphical user interface (GUI) you have to click the relative option for each scenario and wait for it to be done before you can move to the next. Of course, HexSim also gives you the option to generate a batch file and then run it with the command line. To set up the batch file you have to click on each scenario and add it to the batch file, but I think I already mentioned that I'm an extremely lazy person and this is already too much clicking for me. A similar situation applies for the range or movement reports (if you are just starting with HexSim and don't know what these are, suffice to say that these are files where the data you are interested in are dumped for you to use). HexSimR can do all this in one line for each type of option (e.g. log files, movement report, ranges, etc). If you want to generate a report, or combine the log file for all the scenarios you ran (quite likely I'd assume), you can just use 'all' in the argument scenarios, or you can pass a character vector with the names of a subset of scenarios.

Another situation where you may need to add lots of file requests to a batch file is if you are working with genetics. Right now, if you need to obtain an estimation of the genetic distance between two populations as done in the genetic analysis section, it involves manually adding a genepop-report to a queue for each iteration and scenario in a batch file (for each time step you want these statistics to be calculated). In other words, lots of clicking. The alternative is to use w.genepop.batch() that will generate the batch file to obtain the genepop input files from HexSim. In the example here I ask to create only one genepop file for the time step '30', but I could have asked for several time steps at once using a numeric vector (e.g. c(15, 30) ). Note that none of these batch files will work if run using the example data because I didn't include HexSim log files in the example data (log files are huge!).

# Batch file to generate combined log files. Execute this in the command prompt
# Make sure you browsed to the location where HexSimCommandLine.exe is
w.combine.log.batch(path.results, scenarios="all", dir.out=path.results)

# Generate batch file for move and ranges reports. Pass this to OutputTransformer.exe
report.batch(path.results, scenarios="all", ranges=TRUE, move=TRUE)

# batch file to generate genepop files. Pass this to OutputTransformer.exe
w.genepop.batch (path.results, scenarios="all", time.steps=30, 
                 pop.name="Dingoes", traits="PopID")

Once OutputTransformer.exe is done generating the reports, the genepop input files need to be cleaned up with multi.clean.genepop() to make sure that the files respect the formatting required by mmod (Winter, 2012), before you can progress in your analysis as I demonstrate in the example in the 'genetic analysis' section.

Recover simulations

If you have many replicates of the same scenario, it sometimes happens that HexSim doesn't complete these simulations (e.g. a power shut down) and you find yourself with a job half done. HexSim will replace the results if you just restart the job, but sometime it is a pain to start all over again because simulations may take a long time to complete and you may not want to ditch the ones that were successfully completed. If you just move the output to a different location (or rename the folder not to be overwritten by HexSim), you will have the problem that you will have multiple files with the same name when you try to merge the results together. If these files are in the order of hundreds, manually renaming them is out of question. I found myself in this situation several times and I have created a simple function that automatically rename the output files so that it is only necessary to re-run the missing simulations and not the whole job. It is up to the user, though, to confirm that the simulations that were retained were actually completed. If you have a census event in your model, normally it is quite straight forward to confirm that by ensuring that all the simulated years were logged (unless the population went extinct before completion). The code chunk below is an example of how this function can be used (although it is not actually executed in this tutorial to allow the examples below to work).

rename.replicates(path.results, scenario = "DingoBaseSBF_SelDist", suffix = 10:10)

Backing up results

Since HexSimR 0.4 the function copy.results() is deprecated. It was just too difficult to ensure that that function was keeping track of all the little changes and additional outputs from the updated code. To save space, now HexSimR have a new function: compress.logs(). This function compress the log files (which are mostly responsible for the huge size of HexSim results) and optionally remove the original files after compression. If you need to access the uncompressed log files, these can be obtained back with the function decompress.gz(). By selecting delete.log=TRUE all log files are removed. If you have a solid state disk, the speed is improved if you select parallel=TRUE, in which case the number of processors to be used can be selected with ncores. For additional control options on the compression ratio and parallel performances, see ?compress.logs and ?decompress.gz.

Summarise everything in a table

Well, everything in one Table may not be appropriate, but you can use make.table to put together a table for each of the descriptive statistics you have been calculating, and each SSMD comparison. I will start from the descriptive statistics of the demographic data. We pass the names of the files we want to process as a character vector (including the extension). As usual for HexSimR, make.table assumes that if you are processing data over several scenarios, the names of the file in each scenario are the same. There is an important exception: the name of the files generated by collate.census. You may have noticed that the output of this function is named using the following pattern: scenarioName . ncensus . extant |or| all . comb . xlsx. This means that the names will be different for each scenario, but the suffix won't. If you have guessed that then you have to pass the suffix only, you would have been right! Things, of course, may not work as intended if you changed the name of the result files... The other pieces of information that you have to pass are:

table <- make.table(path.results, 
                    scenarios="all", 
                    fnames=paste0( 1:2, ".all.comb.xlsx"), 
                    SSMD=FALSE, 
                    colh=list(c("SexRatio_In", "SexRatio_Out"),
                              c("Inside", "Outside")),
                    vround=1, sdround=3, time.steps=NULL, table.name="Tables.xlsx", 
                    tab.name="Dem_descrip", save2disk=TRUE, 
                    dir.out=NULL)

The table will look something like the one below:

knitr::include_graphics("Table_dem_desc.tif")

Then I add a sheet with the name "Ranges_descrip" to the same xls file (Tables.xlsx), where I summarise the descriptive statistics obtained from the ranges report. Note that, by default, the colh argument is a subset of the headings of the ranges report, which are the ones I'm interested in, so I don't include this argument here and use the default.

table_ranges <- make.table(path.results, 
                    scenarios="all", 
                    fnames="summary_ranges.xlsx", 
                    SSMD=FALSE, 
                    vround=1, sdround=3, time.steps=NULL, table.name="Tables.xlsx", 
                    tab.name="Ranges_descrip", save2disk=TRUE, 
                    dir.out=NULL)

I glided over the selection of the events when I called m.range.rep before, but by selecting these two events I actually have data pre- and post- dispersal. Because I'm the only one who knows this at the moment, I'll make it a bit more explicit by adding a column to the table. I then re-order the columns and finally use knitr::kable to print the table in a more readable format in this document. Note that the object returned by make.table is a data.table so I load this package before I edit it, but if you are not familiar with data.table you can work with them as if they were normal data.frame in R.

library(data.table, quietly = TRUE)
table_ranges[, Dispersal:=rep(c("Pre", "Post"), each=nrow(table_ranges)/length(unique(Scenario)))] 
setcolorder(table_ranges, c("Scenario", "Dispersal", "GroupSize", "Resources", 
                            "nGroups", "ha", "sqkm"))
knitr::kable(table_ranges)

If you want to save the updated table, you can do that with the following:

suppressPackageStartupMessages(library(XLConnect, quietly = TRUE))
wb <- loadWorkbook(paste(path.results, "Tables.xlsx", sep="/"))
createSheet(wb, name="Ranges_descrip")
writeWorksheet(wb, table_ranges, sheet="Ranges_descrip")
saveWorkbook(wb)

Be careful in updating the content of an existing xls file. The basic idea is that the data are pasted over the existing data, but if they cover a smaller area, there might be some mix ups between the old and new data. See XLConnect's documentations to have more control on where the data are saved. Note that if you had changed the name of the sheet, it would be saved as a new tab in the xls file, which may be a safer option if you are unsure.

As an example, I have also added the SSMD results to the table for the census data:

table_SSMD_dem <- make.table(path.results, 
                         scenarios="DingoBaseSBF_SelDist_Strat_Bait_shoot10", 
                         fnames=c("SSMD_census1.xlsx", "SSMD_census2.xlsx"), 
                         SSMD=TRUE, 
                         colh=list(c("SexRatio_In", "SexRatio_Out"),
                                   c("Inside", "Outside")),
                         vround=3, sdround=3, time.steps=NULL, table.name="Tables.xlsx", 
                         tab.name="SSMD_dem", save2disk=TRUE, 
                         dir.out=NULL)

Note that we can't use scenarios="all" when working with SSMD results. This is because the base scenario will be included in the list of scenarios internally created, but it is not present in the result file. When SSMD=TRUE, you have to pass a character vector to select the scenarios you want to include in your table. If you don't want to type in the names of all scenarios, you can obtain them with:

scenarios <- list.dirs(path=path.results, full.names=FALSE, recursive=FALSE)
scenarios

You can then paste/edit this list or modify it within R, for example:

scen_NoBase <- scenarios[2:length(scenarios)]
scen_NoBase

That's it, you made it to the end of this tutorial! I hope it helped you understand some of HexSimR's functionalities and, while I can't promise any prompt reply, I will be grateful for any feedback, positive and negative.

Before you close down your R instance, you can delete the temp directory and all its content (as long as you don't have some files open) with:

unlink(duplicate, recursive = TRUE)

Good luck with the analysis of your own data!


References

Winter, D.J., 2012. mmod: an R library for the calculation of population differentiation statistics. Molecular Ecology Resources 12, 1158-1160.

Zhang, X. D. 2007. A pair of new statistical parameters for quality control in RNA interference high-throughput screening assays. Genomics 89:552-561.



carlopacioni/HexSimR documentation built on Nov. 28, 2020, 4:12 p.m.