knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
This is the vignette of brachypoder
, an R package developed to help read and assemble the data simulated by the brachypode
program. See the online help page for how to install the package.
To load the package, use:
library(brachypoder)
Some example data are provided with the package. To extract where they are saved, run:
root <- system.file("extdata", "sim-example", package = "brachypoder") root
If we have a look at what is inside this folder,
list.files(root)
we see a few .dat
files. These are the files that contain the data saved from one simulation. Different variables are saved in different files, e.g. popsize.dat
contains the total population size at every time point saved. See the repository of \textcolor{blue}{brachypode}
for details.
Each data file contains data with different resolutions. For example, individuals.dat
contains five values per individual and covers all individuals of all the time points saved. The data files are saved in binary format, which is one-dimensional. This means that the arrays of saved values may have to be reshaped in order to be assembled into a data set that is workable within R. This is where the read_data
function comes in.
The function read_data
is the core function of the package. It takes the location of the simulation data as input, the variables to read, and rules that determine how each variable should be reshaped. For example,
read_data(root, "popsize")
reads the binary data in popsize.dat
back into numbers and places them into a one-column tibble. If we now run:
read_data(root, c("time", "popsize"))
we get multiple variables (time.dat
and popsize.dat
) read and assembled together. Now, these two variables both have one value saved every saved time point, and so have the same number of values. When the variables to read have different dimensions (e.g. patchsizes.dat
contains one value per patch, per site, per time point), we use the ncols
argument:
read_data(root, c("time", "patchsizes"), ncols = c(1, 10))
which splits the patchsizes
data into 10 columns, and so its number of rows becomes equal to the number of time points in time
(the 1 tells the function not to split that column), with which they can be attached. The columns in the output data are named after their respective data file or origin, with numbers appended for each column.
In some cases some columns may need to be duplicated instead of split into multiple columns. Then, we use negative numbers in ncols
. For example,
read_data(root, c("time", "patchsizes"), ncols = c(-10, 1))
reads the same data as the previous chunk, but in a "longer" format, where each number of individuals is taken as an observation and therefore as a row. And so here, patchsizes
was kept in a single column but each value in time
was duplicated as many times as there are patches and sites (so 10 times).
Some use-cases may be more intricate. For example, reading individuals.dat
and assigning each individual its own time point requires to know how many individuals there are at each time point. For this we provide a wrapper around read_data
that reads individuals.dat
, time.dat
and popsize.dat
to perform this task:
read_individual_data(root)
This data set contains all the relevant information for each individual saved in the simulation. Because this function has only one use-case (i.e. it is not designed to be flexible, unlike read_data
) it gives the output tibble specific column names.
Same, we provide a function that directly reads and formats the data from the traitmeans.dat
data set, which contains the mean trait value for each trait in each patch of each site, at each saved generation.
read_trait_mean_data(root)
And, similar to that, there is a wrapper function to directly read and format the data from the patchsizes.dat
file:
read_patch_size_data(root)
Besides the functions to read simulation data, one may want to read back the parameters that were used in a given simulation. To do this, use:
pars <- read_parameters(root) pars[1:5] # just a few parameters
Once the data are properly loaded and reshaped, they can be plotted using the usual tools available in R. Using ggplot2
, for example:
library(tidyverse) # Read the proportion of good patches pgood <- read_parameters(root)$pgood pgood <- pgood[-1] # Read trait means per patch per deme data <- read_trait_mean_data(root) data$patch <- as.logical(data$patch) # Relabel data <- data %>% mutate(patch_lab = if_else(patch, "Facilitated", "Unfacilitated")) # Add information about the coverage in good patches data <- data %>% group_by(deme) %>% nest() %>% ungroup() %>% mutate(pgood = pgood) %>% unnest(data) %>% mutate(patch_size = if_else(patch, pgood, 1 - pgood)) # Plot trait through time data %>% ggplot(aes(x = time, y = x, group = interaction(deme, patch), color = patch_lab, alpha = patch_size)) + geom_line() + xlab("Time (generations)") + ylab("Mean trait value") + labs(color = NULL, alpha = "% cover") + scale_color_manual(values = c("gray10", "gray60"))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.