knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

This vignette shows the full pipeline of one single experiment, starting from its parameters to plotting the final distribution of nLTT statistics.

As these experiments are intended to run on a computer cluster, each step saves its result to a file. This is a bit cumbersome for a vignette, but the best that can be done.

First we load the libraries we'll need:

library(raket)

Here we check if all prerequisties are met:

can_run <- TRUE
if (!beastier::is_beast2_installed()) {
  print("Please install BEAST2")
  print("Tip: run 'beastier::install_beast2()'")
  can_run <- FALSE
} else if (!mauricer::is_beast2_ns_pkg_installed()) {
  print("Please install the NS BEAST2 package")
  print("Tip: run 'mauricer::install_beast2_pkg(\"NS\")'")
  can_run <- FALSE
}

To make this vignette as reproducible as possible, a random number generator seed is specified:

set.seed(42)

Create all parameter files:

all_input_filenames <- create_input_files_general(
  general_params_set = create_general_params_set(
    mcmc_chain_length = 16000,
    sequence_length = 16
  ),
  project_folder_name = project_folder_name
)

Note that the created files have some simplified parameters, so the simulations are faster to run.

Remove all parameter files, except for a randomly sampled one with speciation initiation rate of 0.1:

while (1) {
  input_filename <- sample(all_input_filenames, size = 1)
  lowest_sir <- min(rkt_get_spec_init_rates())
  if (utils::read.csv(input_filename)$sirg > lowest_sir) next
  if (utils::read.csv(input_filename)$siri > lowest_sir) next
  break
}
statuses <- file.remove(all_input_filenames[ all_input_filenames != input_filename] )
testit::assert(all(statuses == TRUE))
testit::assert(file.exists(input_filename))

The input file only contains parameters:

print(utils::read.csv(input_filename))

The input file is then run, its results save to an output file:

posterior_filesname <- tempfile("out.RDa")
raket::run_raket(
  input_filename = input_filename,
  posterior_filesname = posterior_filesname,
  verbose = TRUE
)

The output file contains many things, including the parameters used:

print(readRDS(posterior_filesname))

From this raw data, the nLTTs can be measured:

nltts_filename <- razzo::collect_nltt_stats(project_folder_name = project_folder_name)

The resulting file only contains the parameters and nLTT statistics:

print(names(readRDS(nltts_filename)))

We can already plot the nLTT statistics. Here I show a raw histogram and the density plot. The latter will be used in the final result as well. For few nLTT statistics (that is, short MCMC chain lengths), this picture will look clumsy:

ggplot2::ggplot(
  data = data.frame(nltts = readRDS(nltts_filename)$nltts),
  ggplot2::aes(x = nltts)
) + 
  ggplot2::geom_histogram(binwidth = 0.01) + 
  ggplot2::geom_density()

The next step in the experiment is to convert all nLTT files to one comma seperated file. As we only run one experiment, there is only one nLTT file to supply:

csv_filename <- tempfile("nltts.csv")
raket::nltt_files_to_csv(
  nltt_filenames = nltts_filename, 
  csv_filename = csv_filename
)

The comma-seperated file contains as many rows as there are experiments run. In this case, it will have only one row, looking like this:

knitr::kable(read.csv(file = csv_filename))

For Tidy Data, the comma-seperated file needs to be read and converted to the long form.

df <- raket::to_long(df = read.csv(csv_filename))

In this long form, each row contains one measurement (that is, one nLTT statistic):

knitr::kable(df)

Finally, we plot all (in this case, one) experiments:

raket::create_fig_1(df_long = df)


richelbilderbeek/raket documentation built on Dec. 31, 2019, 7:41 p.m.