In rscherrer/brachypoder: Reading Brachypode Simulations

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

This is the vignette of brachypoder, an R package developed to help read and assemble the data simulated by the brachypode program. See the online help page for how to install the package.

To load the package, use:

library(brachypoder)

Some example data are provided with the package. To extract where they are saved, run:

root <- system.file("extdata", "sim-example", package = "brachypoder")
root

If we have a look at what is inside this folder,

list.files(root)

we see a few .dat files. These are the files that contain the data saved from one simulation. Different variables are saved in different files, e.g. popsize.dat contains the total population size at every time point saved. See the repository of \textcolor{blue}{brachypode} for details.

Each data file contains data with different resolutions. For example, individuals.dat contains five values per individual and covers all individuals of all the time points saved. The data files are saved in binary format, which is one-dimensional. This means that the arrays of saved values may have to be reshaped in order to be assembled into a data set that is workable within R. This is where the read_data function comes in.

The function read_data is the core function of the package. It takes the location of the simulation data as input, the variables to read, and rules that determine how each variable should be reshaped. For example,

read_data(root, "popsize")

reads the binary data in popsize.dat back into numbers and places them into a one-column tibble. If we now run:

read_data(root, c("time", "popsize"))

we get multiple variables (time.dat and popsize.dat) read and assembled together. Now, these two variables both have one value saved every saved time point, and so have the same number of values. When the variables to read have different dimensions (e.g. patchsizes.dat contains one value per patch, per site, per time point), we use the ncols argument:

read_data(root, c("time", "patchsizes"), ncols = c(1, 10))

which splits the patchsizes data into 10 columns, and so its number of rows becomes equal to the number of time points in time (the 1 tells the function not to split that column), with which they can be attached. The columns in the output data are named after their respective data file or origin, with numbers appended for each column.

In some cases some columns may need to be duplicated instead of split into multiple columns. Then, we use negative numbers in ncols. For example,

read_data(root, c("time", "patchsizes"), ncols = c(-10, 1))

reads the same data as the previous chunk, but in a "longer" format, where each number of individuals is taken as an observation and therefore as a row. And so here, patchsizes was kept in a single column but each value in time was duplicated as many times as there are patches and sites (so 10 times).

Some use-cases may be more intricate. For example, reading individuals.dat and assigning each individual its own time point requires to know how many individuals there are at each time point. For this we provide a wrapper around read_data that reads individuals.dat, time.dat and popsize.dat to perform this task:

read_individual_data(root)

This data set contains all the relevant information for each individual saved in the simulation. Because this function has only one use-case (i.e. it is not designed to be flexible, unlike read_data) it gives the output tibble specific column names.

Same, we provide a function that directly reads and formats the data from the traitmeans.dat data set, which contains the mean trait value for each trait in each patch of each site, at each saved generation.

read_trait_mean_data(root)

And, similar to that, there is a wrapper function to directly read and format the data from the patchsizes.dat file:

read_patch_size_data(root)

Besides the functions to read simulation data, one may want to read back the parameters that were used in a given simulation. To do this, use:

pars <- read_parameters(root)
pars[1:5] # just a few parameters

Once the data are properly loaded and reshaped, they can be plotted using the usual tools available in R. Using ggplot2, for example:

library(tidyverse)

# Read the proportion of good patches
pgood <- read_parameters(root)$pgood
pgood <- pgood[-1]

# Read trait means per patch per deme
data <- read_trait_mean_data(root)
data$patch <- as.logical(data$patch)

# Relabel
data <- data %>% mutate(patch_lab = if_else(patch, "Facilitated", "Unfacilitated"))

# Add information about the coverage in good patches
data <- data %>%
  group_by(deme) %>%
  nest() %>%
  ungroup() %>%
  mutate(pgood = pgood) %>%
  unnest(data) %>%
  mutate(patch_size = if_else(patch, pgood, 1 - pgood))

# Plot trait through time
data %>%
  ggplot(aes(x = time, y = x, group = interaction(deme, patch), color = patch_lab, alpha = patch_size)) +
  geom_line() +
  xlab("Time (generations)") +
  ylab("Mean trait value") +
  labs(color = NULL, alpha = "% cover") +
  scale_color_manual(values = c("gray10", "gray60"))