View source: R/geno_last_gen_admix_recomb.R
geno_last_gen_admix_recomb | R Documentation |
This function in essence combines pop_recomb()
to simulate founders of known ancestries with LD (following a Li-Stephens-like model), draws recombination breaks of focal last-generation descendants from the specified pedigree using recomb_last_gen()
, and their genomes from the founders variants using recomb_haplo_inds()
.
However, since a limited portion of founder sequences is actually inherited, the simulation is made much more efficient by simulating only those subsequences that were inherited, which saves time, and utilizing sparse matrices, which saves memory too.
See below for a more detailed algorithm.
geno_last_gen_admix_recomb(
anc_haps,
bim,
map,
G,
fam,
ids,
founders_anc,
indexes_chr_ends = NULL,
loci_on_cols = FALSE,
missing_vals = c("", 0)
)
anc_haps |
A named list that maps the code used for each ancestry to its haplotype matrix.
Each of the haplotype matrices the argument |
bim |
The table of variants of |
map |
The genetic map, a list of chromosomes each of which is a data.frame/tibble with columns |
G |
Number of generations since most recent common ancestor of population (to multiply standard recombination rate) |
fam |
The pedigree data.frame, in plink FAM format.
Only columns |
ids |
A list containing vectors of IDs for each generation.
All these IDs must be present in |
founders_anc |
a named vector that maps every founder haplotype (the names of this vector) to its ancestry code.
Ancestry codes must match the codes used in |
indexes_chr_ends |
Optional vector mapping each chromosome (index in vector) to the last index in the |
loci_on_cols |
If |
missing_vals |
The list of ID values treated as missing.
|
This function wraps around several exported package functions to achieve its objectives, which are roughly grouped into the following 4 phases.
Phase 1 simulates recombination in the family without explicit sequences.
In particular, it initializes the founder haplotype structure (without variants yet) using recomb_init_founders()
, then simulates recombination breaks along the pedigree and identifies all the founder haplotype blocks in the focal individuals using recomb_last_gen()
, and maps recombination breaks in cM to basepairs using recomb_map_inds()
.
Phase 2 reorganizes this data to identify the unique founder blocks that were inherited, first by making the data tidy with tidy_recomb_map_inds()
, then applying recomb_founder_blocks_inherited()
.
Phase 3 initializes founder haplotypes using sparse matrices from the package Matrix
, and draws inherited founder subsequences according to their known ancestries and using the Li-Stephens-like haplotype model of pop_recomb()
.
Phase 4 constructs the genotype matrices of focal individuals using the haplotypes of the founders drawn in phase 3 and the known origin of focal blocks from founders from phase 1, first constructing this data at the phased haplotype level with recomb_haplo_inds()
, reencoding as unphased genotypes using recomb_geno_inds()
, and constructing the corresponding local ancestry dosages using recomb_admix_inds()
.
A named list with three elements:
X
: the genotype matrix of the focal individuals, as returned by recomb_geno_inds()
.
Ls
: a list, mapping each ancestry to its matrix of local ancestry dosages, as returned by recomb_admix_inds()
.
haplos
: a phased version of the haplotypes and local ancestries of the focal individuals, structured as nested lists, as returned by recomb_haplo_inds()
.
recomb_init_founders()
, recomb_last_gen()
, recomb_map_inds()
, tidy_recomb_map_inds()
, recomb_founder_blocks_inherited()
, pop_recomb()
, recomb_haplo_inds()
, recomb_geno_inds()
, recomb_admix_inds()
.
library(tibble)
# simulate random haplotypes for example
# this toy data has 10 SNPs per chromosome, in fixed positions for simplicity
bim <- tibble( chr = rep( 1 : 22, each = 10 ), pos = rep( (1:10) * 1e6, 22 ) )
# and random haplotype data to go with this
n_ind_hap <- 10
m_loci <- nrow( bim )
# NOTE ancestry labels can be anything but must match `founders_anc` below
anc_haps <- list(
'AFR' = matrix( rbinom( m_loci * n_ind_hap, 1L, 0.5 ), nrow = m_loci, ncol = n_ind_hap ),
'EUR' = matrix( rbinom( m_loci * n_ind_hap, 1L, 0.2 ), nrow = m_loci, ncol = n_ind_hap )
)
# now simulate a very small family with one individual, 2 parents, 4 implicit grandparents
data <- fam_ancestors( 2 )
fam <- data$fam
ids <- data$ids
# select ancestries for each of the 4 grandparents / founder haplotypes (unadmixed)
founders_anc <- c('AFR', 'AFR', 'AFR', 'EUR')
# set names of founders with _pat/mat, needed to match recombination structure
# order is odd but choices were random so that doesn't matter
names( founders_anc ) <- c(
paste0( ids[[1]], '_pat' ),
paste0( ids[[1]], '_mat' )
)
# this performs the simulation!
data <- geno_last_gen_admix_recomb( anc_haps, bim, recomb_map_hg38, 10, fam, ids, founders_anc )
# this is the genotype matrix for the one admixed individual
data$X
# the corresponding local ancestry dosage matrices
# names match input labels
data$Ls$AFR
data$Ls$EUR
# if desired, a more complete but more complicated structure holding phased haplotypes
# and phased local ancestry information
data$haplos
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.