knitr::opts_chunk$set(fig.width = 7)

First, the library is loaded:

```
library(malan)
```

For reproducibility, the seed for the (pseudo) random number generator is set:

set.seed(1)

A standard Wright-Fisher population can be simulated (hiding progress information) as follows:

sim_res <- sample_geneology(population_size = 10, generations = 10, progress = FALSE)

Until pedigrees are built/inferred, there is not much information available (e.g. about children). So let us infer the pedigrees:

pedigrees <- build_pedigrees(sim_res$population, progress = FALSE) pedigrees pedigrees_count(pedigrees) pedigrees_table(pedigrees) pedigree_size(pedigrees[[1]]) pedigree_size(pedigrees[[2]]) #pedigree_size(pedigrees[[3]]) # error as there are only 2 pedigrees

The pedigrees can be plotted all at once:

```
plot(pedigrees)
```

Or just one at a time:

```
plot(pedigrees[[1]])
```

```
plot(pedigrees[[2]])
```

Some information about the population can be obtained. For example, the individuals in the final generation can be saved:

str(sim_res, 1) live_individuals <- sim_res$end_generation_individuals

And a live individual is printed:

```
print_individual(live_individuals[[1]])
```

We can also print another individual (from the entire population):

indv <- get_individual(sim_res$population, 22) print_individual(indv)

set.seed(1) mutrts <- c(0.5, 0.5) pedigrees_all_populate_haplotypes(pedigrees = pedigrees, loci = length(mutrts), mutation_rates = mutrts, progress = FALSE)

Individual pedigrees can now be plotted with haplotype information:

plot(pedigrees[[1]], haplotypes = TRUE)

And the individual id can be removed to only display the haplotype:

plot(pedigrees[[1]], ids = FALSE, haplotypes = TRUE)

And one or more individuals can be marked/highlighted:

plot(pedigrees[[1]], ids = TRUE, haplotypes = TRUE, mark_pids = c(14, 16))

By standard, only the last generation contains $N$ individuals. If the 3 last generations should be full, this can be done by specifying `generations_full = 3`

as follows:

set.seed(1) sim_res <- sample_geneology(population_size = 10, generations = 5, generations_full = 3, progress = FALSE) pedigrees <- build_pedigrees(sim_res$population, progress = FALSE) plot(pedigrees)

And to obtain the complete history, `generations_full`

is set to `generations`

:

set.seed(1) sim_res <- sample_geneology(population_size = 10, generations = 5, generations_full = 5, progress = FALSE) pedigrees <- build_pedigrees(sim_res$population, progress = FALSE) plot(pedigrees)

Now, there are 10 individuals in all generations.

By standard, the number of generations are specified. Instead, it can be specified to continue simulating until one common founder is reached by specifying `generations = -1`

:

set.seed(1) sim_res <- sample_geneology(population_size = 10, generations = -1, progress = FALSE) pedigrees <- build_pedigrees(sim_res$population, progress = FALSE) plot(pedigrees)

The number of generations needed can be obtained as follows:

```
sim_res$generations
```

Let's try to simulate a larger population with 3 full generations (the additional `generations_return`

is to get all individuals in the last 3 generations returned in the `individuals_generations`

slot, cf. below):

set.seed(1) sim_res <- sample_geneology(population_size = 1e3, generations = 200, generations_full = 3, generations_return = 3, # default value progress = FALSE)

And build the pedigrees:

pedigrees <- build_pedigrees(sim_res$population, progress = FALSE) pedigrees_table(pedigrees) pedigrees_count(pedigrees)

So there are `r pedigrees_count(pedigrees)`

pedigrees. Let's try to plot the largest one:

ped_sizes <- sapply(1L:pedigrees_count(pedigrees), function(i) pedigree_size(pedigrees[[i]])) ped_sizes largest_i <- which.max(ped_sizes) plot(pedigrees[[largest_i]])

And the impose mutations from a 20 locus haplotype with mutation rate 0.001 per locus:

set.seed(10) mutrts <- rep(0.001, 20) pedigrees_all_populate_haplotypes(pedigrees = pedigrees, loci = length(mutrts), mutation_rates = mutrts, progress = FALSE)

The haplotypes at the live individuals (3 generations) can be inspected:

live_individuals <- sim_res$individuals_generations length(live_individuals)

haps <- get_haplotypes_individuals(individuals = live_individuals)

```
head(haps)
```

Lets look at the spectrum:

haps_str <- apply(haps, 1, paste0, collapse = ";") haps_tab <- table(haps_str) sort(haps_tab, decreasing = TRUE)[1:10] spectrum <- table(haps_tab) spectrum

set.seed(100) Q_index <- sample.int(n = length(live_individuals), size = 1) Q <- live_individuals[[Q_index]] Q_hap <- get_haplotype(Q) Q_hap

First, identify $Q$'s pedigree:

Q_ped <- get_pedigree_from_individual(Q)

Now, count matches in pedigree and in live population:

count_haplotype_occurrences_pedigree(pedigree = Q_ped, haplotype = Q_hap, generation_upper_bound_in_result = 2) count_haplotype_occurrences_individuals(individuals = live_individuals, haplotype = Q_hap)

We can also inspect pedigree matches information about number of meioses and $L_1$ distances:

path_details <- pedigree_haplotype_matches_in_pedigree_meiosis_L1_dists(suspect = Q, generation_upper_bound_in_result = 2)

nrow(path_details) head(path_details)

Look at the distribution of number of meioses between $Q$ and the matches (there are 0 meioses between Q and himself):

meioses <- path_details[, 1] hist(meioses)

On the path between $Q$ and the match, the maximum $L_1$ difference between $Q$'s haplotype and the haplotypes of the individuals on the path is recorded (0 means that that no mutations have occured on the path between $Q$ and the match):

L1_max <- path_details[, 2] table(L1_max) mean(L1_max == 0)

Draw true contributors:

set.seed(100) U_indices <- sample.int(n = length(live_individuals), size = 2, replace = FALSE) U1 <- live_individuals[[U_indices[1]]] U2 <- live_individuals[[U_indices[2]]] H1 <- get_haplotype(U1) H2 <- get_haplotype(U2)

View haplotypes:

```
rbind(H1, H2)
```

Now, find those haplotype in live individuals (those haplotypes are in `haps`

from before) that are included in (or compatible with) the mixture:

#mixres <- indices_in_mixture_by_haplotype_matrix(haplotypes = haps, H1 = H1, H2 = H2) mixres <- mixture_info_by_individuals_2pers(live_individuals, U1, U2) str(mixres, 1)

Compare with matches:

length(mixres$pids_matching_donor1) count_haplotype_occurrences_individuals(individuals = live_individuals, haplotype = H1) length(mixres$pids_matching_donor2) count_haplotype_occurrences_individuals(individuals = live_individuals, haplotype = H2)

In mixture that are not H1 and not H2:

length(mixres$pids_others_included)

Inspect these (only unique ones):

others_haps <- get_haplotypes_pids(sim_res$population, mixres$pids_others_included) others_haps <- others_haps[!duplicated(others_haps), ] others_haps

And get their population counts:

others_haps_counts <- unlist(lapply(seq_len(nrow(others_haps)), function(hap_i) { count_haplotype_occurrences_individuals(individuals = live_individuals, haplotype = others_haps[hap_i, ]) })) sum(others_haps_counts) length(mixres$pids_others_included)

Compare with profiles of true contributors:

```
rbind(H1, H2)
```

Let $\alpha$ be the parameter of a symmetric Dirichlet distribution specifying each man's probability to be the father of an arbitrary male in the next generation. When $\alpha=5$, a man's relative probability to be the father has 95\% probability to lie between 0.32 and 2.05, compared with a constant 1 under the standard Wright-Fisher model and the standard deviation in the number of male offspring per man is 1.10 (standard Wright-Fisher = 1).

This symmetric Dirichlet distribution is implemented by drawing father (unscaled) probabilities from a Gamma distribution with parameters shape and scale that are then normalised to sum to 1. To obtain a symmetric Dirichlet distribution with parameter $\alpha$, the shape must be $\alpha$ and scale $1/\alpha$. This is simulated as follows (note the `enable_gamma_variance_extension`

parameter):

dirichlet_alpha <- 5 set.seed(1) sim_res <- sample_geneology(population_size = 10, generations = 10, enable_gamma_variance_extension = TRUE, gamma_parameter_shape = dirichlet_alpha, gamma_parameter_scale = 1 / dirichlet_alpha, progress = FALSE) pedigrees <- build_pedigrees(sim_res$population, progress = FALSE)

```
plot(pedigrees)
```

Let us verify the claim that the standard deviation in the number of male offspring per man is 1.10. The easiest way is to get information about father id's, which is done by asking for a `verbose_result`

:

N <- 1000 set.seed(1) sim_res <- sample_geneology(population_size = N, generations = 2, enable_gamma_variance_extension = TRUE, gamma_parameter_shape = dirichlet_alpha, gamma_parameter_scale = 1/dirichlet_alpha, progress = FALSE, verbose_result = TRUE) tbl_fathers_with_children <- table(sim_res$father_pids[, 1]) tbl_fathers_no_children <- rep(0, N - length(tbl_fathers_with_children)) number_of_children <- c(tbl_fathers_with_children, tbl_fathers_no_children) number_of_children <- as.numeric(number_of_children) mean(number_of_children) sd(number_of_children)

Let os get estimates in parallel:

get_number_children <- function(N) { sim_res <- sample_geneology(population_size = N, generations = 2, enable_gamma_variance_extension = TRUE, gamma_parameter_shape = dirichlet_alpha, gamma_parameter_scale = 1 / dirichlet_alpha, progress = FALSE, verbose_result = TRUE) tbl_fathers_with_children <- table(sim_res$father_pids[, 1]) tbl_fathers_no_children <- rep(0, N - length(tbl_fathers_with_children)) number_of_children <- c(tbl_fathers_with_children, tbl_fathers_no_children) number_of_children <- as.numeric(number_of_children) return(number_of_children) } library(parallel) options(mc.cores = 2) RNGkind("L'Ecuyer-CMRG") # for mclapply set.seed(1) x <- mclapply(1:100, function(i) get_number_children(100)) sds <- unlist(lapply(x, sd)) mean(sds)

Population growth can be simulated by specifying the population size at each generation by the `population_sizes`

vector, where the length thus specifies the number of generations:

set.seed(1) sim_res_growth <- sample_geneology_varying_size(population_sizes = c(10, 20, 10), generations_full = 3, progress = FALSE)

Note how `generations_full`

was used to obtain all individuals in the generations (and not just those with descendants in the last two):

pedigrees_growth <- build_pedigrees(sim_res_growth$population, progress = FALSE)

```
plot(pedigrees_growth)
```

**Any scripts or data that you put into this service are public.**

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.