knitr::opts_chunk$set(fig.width = 7)
First, the library is loaded:
library(malan)
For reproducibility, the seed for the (pseudo) random number generator is set:
set.seed(1)
First, the population sizes are determined:
population_sizes <- round(c(rep(100, 10), 100*1.02^(1:10))) plot(population_sizes, type = "b")
A population can be simulated (hiding progress information) as follows:
set.seed(1) # For reproducibility sim_res_growth <- sample_geneology_varying_size( population_sizes = population_sizes, # VRS = 0.2: enable_gamma_variance_extension = TRUE, gamma_parameter_shape = 5, gamma_parameter_scale = 1/5, # Live population: # 3 generations generations_full = 3, generations_return = 3, progress = FALSE)
Live population:
live_pop <- sim_res_growth$individuals_generations
Until pedigrees are build/infered, there is not much information available (e.g. about children). So let us infer the pedigrees:
pedigrees <- build_pedigrees(sim_res_growth$population, progress = FALSE) pedigrees pedigrees_count(pedigrees) pedigrees_table(pedigrees) pedigree_size(pedigrees[[1]])
We can look at the population as a (tidy)graph:
g <- as_tbl_graph(pedigrees) g
This can be plotted:
if (requireNamespace("ggraph", quietly = TRUE)) { library(ggraph) p <- ggraph(g, layout = 'tree') + geom_edge_link() + geom_node_point(size = 8) + geom_node_text(aes(label = name), color = "white") + facet_nodes(~ ped_id) + theme_graph() print(p) }
This is rather difficult to make any sense of. Let's instead plot only pedigree 1:
PED_ID <- 1 g_ped2 <- g %>% activate(nodes) %>% filter(ped_id == PED_ID) if (requireNamespace("ggraph", quietly = TRUE)) { library(ggraph) p <- ggraph(g_ped2, layout = 'tree') + geom_edge_link() + geom_node_point(size = 8) + geom_node_text(aes(label = name), color = "white") + theme_graph() print(p) }
Up until now, only the genealogy has been simulated. Now, we run a mutational process, i.e. assign haplotypes to founders and let haplotypes flow down the individuals.
We use realistic data. In the package, there is information about the individual markers:
ystr_markers
Note, that MutProb
is the point estimate given by MutProb = Mutations / Meioses
. Information about which markers that are in which kit is also provided:
ystr_kits ystr_kits %>% count(Kit)
Let us take all PowerPlex Y23 markers and assume that we only have a partial profile where DYS437
and DYS448
dropped out. At the same time, we also filter out the integer alleles (for generating random founder haplotypes in a minute):
partial_kit <- ystr_kits %>% filter(Kit == "PowerPlex Y23") %>% inner_join(ystr_markers, by = "Marker") %>% filter(!(Marker %in% c("DYS437", "DYS448"))) %>% rowwise() %>% # To work on each row mutate(IntegerAlleles = list(Alleles[Alleles == round(Alleles)]), MinIntAllele = min(IntegerAlleles), MaxIntAllele = max(IntegerAlleles)) %>% ungroup() %>% select(-Kit, -Alleles) partial_kit
This "partial kit" has the following mutation probabilities:
mu <- partial_kit %>% pull(MutProb) mu
We can make a founder haplotype generator as follows (sampling alleles randomly is not how Y-STR works, but it may work fine for founder haplotypes):
generate_random_haplotype <- function() { partial_kit %>% rowwise() %>% mutate(Allele = IntegerAlleles[sample.int(length(IntegerAlleles), 1)]) %>% pull(Allele) }
Now, a new haplotype is created everytime the function is called (with no arguments):
generate_random_haplotype() generate_random_haplotype()
Of course such generator can also be created for a reference database with Y-STR profiles.
Now, we are ready to assign haplotypes to the genealogy:
set.seed(1) pedigrees_all_populate_haplotypes_custom_founders( pedigrees = pedigrees, get_founder_haplotype = generate_random_haplotype, mutation_rates = mu, progress = FALSE)
We can now plot pedigrees with haplotype information (note that as_tbl_graph
needs to be called again):
g_ped2 <- as_tbl_graph(pedigrees) %>% activate(nodes) %>% filter(ped_id == PED_ID) %>% group_by(name) %>% mutate(haplotype_str = paste0(haplotype[[1]], collapse = ";")) #mutate(haplotype_str = map(haplotype, paste0, collapse = ";")[[1]]) if (requireNamespace("ggraph", quietly = TRUE)) { library(ggraph) p <- ggraph(g_ped2, layout = 'tree') + geom_edge_link() + geom_node_point(aes(color = haplotype_str), size = 8) + geom_node_text(aes(label = name), color = "white") + theme_graph() print(p) }
We have live_pop
from the population.
set.seed(5) Q_index <- sample.int(n = length(live_pop), size = 1) Q <- live_pop[[Q_index]] print_individual(Q) Q_hap <- get_haplotype(Q) Q_hap
Now, count matches in live part of pedigree and in live part of population:
Q_ped <- get_pedigree_from_individual(Q) ped_live_matches <- count_haplotype_occurrences_pedigree( pedigree = Q_ped, haplotype = Q_hap, generation_upper_bound_in_result = 2) # gen 0, 1, 2 pop_live_matches <- count_haplotype_occurrences_individuals( individuals = live_pop, haplotype = Q_hap) ped_live_matches pop_live_matches
We can also inspect pedigree matches information about number of meioses and $L_1$ distances:
path_details <- pedigree_haplotype_matches_in_pedigree_meiosis_L1_dists( suspect = Q, generation_upper_bound_in_result = 2)
nrow(path_details) head(path_details)
This can of course be repeated to many populations (genealogies, haplotype processes, suspects etc.). Also note that variability can be put on mutation rates, e.g. by a Bayesian approach.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.