README.md

DOI

simmiad

R package to simulate populations of wild Emmer wheat from the Kibbutz Ammiad

Table of contents

  1. Introduction
  2. Installation
  3. Dependencies
  4. How simulations work
  5. Usage
  6. Author and license information

Introduction

An R package for simulating a population of wild Emmer wheat to ask whether the amount of spatial clustering of unique genotypes and the stability of that clustering through time can be explained by purely neutral forces. The idea is to simulate a population of plants evolving under seed dispersal and limited, random outcrossing only, then to sample plants along a transect in the same way that the real population is sampled.

Installation

From GitHub

Installation is easiest straight from GitHub using the package devtools from within R. If necessary, install this with

install.packages("devtools")

Then you can install with

devtools::install_github("ellisztamas/simmiad")

Dependencies

In most cases, simmiad uses base R functions only. One experimental feature uses mvtnorm to generate samples from a multivariate normal distribution, but this is probably not needed.

How simulations work

Initial generation:

Simulating through time:

Transect samples

Assumptions

This makes certain assumptions that it is good to be explicit about:

Usage

Simulate a single population

Functions in simmiad simulate populations given a set of input parameters:

To simulate a single population you can use sim_population. For example, this runs a single simulation of a population with 3x3=9 plants of 124 genotypes for 100 generations, with mean dispersal distance of 3m and an outcrossing rate of 1%.

library('simmiad')
set.seed(124) # so you get the same answer as me

# Set input parameters
mean_dispersal_distance = 0.5
outcrossing_rate = 0.01
n_generations = 10
n_starting_genotypes = 10
density = 1
how_far_back <- n_generations
n_sample_points = 30
sample_spacing = 5

sm <- sim_population(
  mean_dispersal_distance = mean_dispersal_distance,
  outcrossing_rate = outcrossing_rate,
  n_generations = n_generations,
  n_starting_genotypes = n_starting_genotypes,
  density = density,
  n_sample_points = n_sample_points,
  sample_spacing = sample_spacing,
  )

This returns a list of genotypes in each generation. The final generation looks like this:

 [1] NA         NA         NA         "g2"       NA         "g8"       "g8"       NA         "g1_3.363"
[10] "g8"       "g5"       "g7"       "g1"       "g10"      "g1"       "g4"       "g7"       "g3"      
[19] NA         "g2_5.77"  "g4"       "g9"       "g4"       "g3"       "g6"       "g5"       "g1"      
[28] NA         "g3"       "g1"             

Replicate simulations

Most of the time you will want to simulate multiple replicate populations with a set of input parameters. This can be done with the function simmiad using similar input parameters as before.

rs <- simmiad(
  mean_dispersal_distance = 0.5,
  outcrossing_rate = 0.001,
  n_generations = 12,
  n_starting_genotypes = 10,
  density = 3,
  n_sample_points = 5,
  sample_spacing = 2,
  nsims = 3,
  how_far_back = 9
)

This function simulates multiple individual populations through time, and returns a list of different data:

  1. parameters A data.frame giving input parameters.
  2. clustering The covariance between distance along the transect and the frequency of identical genotypes.
  3. matching_pairs: The number of pairs of identical genotypes in the transect.
  4. count_NA: The number of empty sampling points.
  5. n_genotypes: The number of unique genotypes sampled in the transect (note that this will be different from what you gave as n_starting_genotypes, because the latter reflects genotypes in the whole population, not just in the transect).
  6. stability: How often individual sampling points are occupied by the same genotype in the final generations and 1, 2, ..., n generations back.
  7. distance_identity: Probabilities of finding identical genotypes in pairs of sampling points at all possible distances between transects. For example, if there are five evenly spaced sampling points as in the example above, there are four possible distances between sampling points. Rows indicate replicate simulations.

In points 2 to 6 above, rows show replicate simulations and columns show generations.

Author and license information

Tom Ellis (thomas.ellis@gmi.oeaw.ac.at)

simmiad is available under the MIT license. See LICENSE for more information.



ellisztamas/simmiad documentation built on Dec. 12, 2023, 5:32 a.m.