recomb_admix_inds: Reduce haplotype ancestry data to population ancestry dosage...

View source: R/recomb_admix_inds.R

recomb_admix_indsR Documentation

Reduce haplotype ancestry data to population ancestry dosage matrices

Description

This function accepts haplotype data, such as the output from recomb_haplo_inds() with ret_anc = TRUE (required), and reduces it to a list of population ancestry dosage matrices. In this context, "ancestors/ancestry" refer to haplotype blocks from specific ancestor individuals, whereas "population ancestry" groups these ancestors into populations (such as African, European, etc.). Although the haplotype data separates individuals and chromosomes into lists (the way it is simulated), the output matrices concatenates data from all chromosomes into a single matrix, as it appears in simpler simulations and real data, and matching the format of recomb_geno_inds().

Usage

recomb_admix_inds(haplos, anc_map, pops = sort(unique(anc_map$pop)))

Arguments

haplos

A list of diploid individuals, each of which is a list with two haploid individuals named pat and mat, each of which is a list of chromosomes, each of which must be a list with a named element anc must give the vector of ancestor names per position (the output format from recomb_haplo_inds() with ret_anc = TRUE).

anc_map

A data.frame or tibble with two columns: anc lists every ancestor haplotype name present in haplos, and pop the population assignment of that haplotype.

pops

Optional order of populations in output, by default sorted alphabetically from anc_map$pop.

Value

A named list of population ancestry dosage matrices, ordered as in pops, each of which counts populations in both alleles (in 0, 1, 2), with individuals along columns in same order as haplos list, and loci along rows in order of appearance concatenating chromosomes in numerical order.

See Also

recomb_fam() for drawing recombination (ancestor) blocks, defined in terms of genetic distance.

recomb_map_inds() for transforming genetic to basepair coordinates given a genetic map.

recomb_haplo_inds() for determining haplotypes of descendants given ancestral haplotypes (creates input to this function).

Examples

# Lengthy code creates individuals with recombination data to map
# The smallest pedigree, two parents and a child (minimal fam table).
library(tibble)
fam <- tibble(
  id = c('father', 'mother', 'child'),
  pat = c(NA, NA, 'father'),
  mat = c(NA, NA, 'mother')
)
# use latest human recombination map, but just first two chrs to keep this example fast
map <- recomb_map_hg38[ 1L:2L ]
# initialize parents with this other function
founders <- recomb_init_founders( c('father', 'mother'), map )
# draw recombination breaks for child
inds <- recomb_fam( founders, fam )
# now add base pair coordinates to recombination breaks
inds <- recomb_map_inds( inds, map )

# also need ancestral haplotypes
# these should be simulated carefully as needed, but for this example we make random data
haplo <- vector( 'list', length( map ) )
# names of ancestor haplotypes for this scenario
# (founders of fam$id but each with "_pat" and "_mat" suffixes)
anc_names <- c( 'father_pat', 'father_mat', 'mother_pat', 'mother_mat' )
n_ind <- length( anc_names )
# number of loci per chr, for toy test
m_loci <- 10L
for ( chr in 1L : length( map ) ) {
    # draw random positions
    pos_chr <- sample.int( max( map[[ chr ]]$pos ), m_loci )
    # draw haplotypes
    X_chr <- matrix(
        rbinom( m_loci * n_ind, 1L, 0.5 ),
        nrow = m_loci,
        ncol = n_ind
    )
    # required column names!
    colnames( X_chr ) <- anc_names
    # add to structure, in a list
    haplo[[ chr ]] <- list( X = X_chr, pos = pos_chr )
}
# determine haplotypes and per-position ancestries of descendants given ancestral haplotypes
haplos <- recomb_haplo_inds( inds, haplo, ret_anc = TRUE )

# define individual to population ancestry map
# take four ancestral haplotypes from above, assign them population labels
anc_map <- tibble(
    anc = anc_names,
    pop = c('African', 'European', 'African', 'African')
)

# finally, run desired function!
# convert haplotypes structure to list of population ancestry dosage matrices
Xs <- recomb_admix_inds( haplos, anc_map )


simfam documentation built on Jan. 10, 2023, 1:06 a.m.