pop_recomb: Simulate genotypes with linkage disequilibrium (LD) given a...

View source: R/pop_recomb.R

pop_recombR Documentation

Simulate genotypes with linkage disequilibrium (LD) given a population of haplotypes, using a Li-Stephens-like model of haplotype copying

Description

Each new genome has breaks drawn from the recombination model accelerated for the desired number of generations G, and haplotypes are drawn and copied randomly from the population. This results in a population with individuals drawn independently and identically distributed but with LD within individuals, since the ancestral LD is preserved to some extent (attenuated by recombination after G generations). If there is no LD in haps, then the output will not have LD either except when the number of columns of haps is very small (which resembles a bottleneck). This model does not introduce mutations (unlike the original Li-Stephens). Genotypes, when requested, are simply sums of independently drawn haplotype values.

Usage

pop_recomb(
  haps,
  bim,
  map,
  G,
  n_ind,
  geno = TRUE,
  loci_on_cols = FALSE,
  indexes_loci = NULL,
  indexes_chr_ends = NULL
)

Arguments

haps

Regular matrix or BEDMatrix object of haplotype values, one row per locus, one column per haplotype (half individual), or transposed if loci_on_cols = TRUE and for BEDMatrix objects. If geno = TRUE (default), these values should be numeric (recommended are zeroes and ones, indicating absence or presence of reference allele), but if geno = FALSE code will work with any values, including strings, which are just copied to outputs in blocks.

bim

The table of variants of haps, which is a data.frame/tibble with at least two columns: chr (must be numeric between 1 and the maximum chromosome in map below for map to work) and pos (base pair position, usually an integer).

map

The genetic map, a list of chromosomes each of which is a data.frame/tibble with columns pos for base pair position and posg for genetic position.

G

Number of generations since most recent common ancestor of population (to multiply standard recombination rate)

n_ind

Number of individuals (if geno = TRUE) or haplotypes (half individuals, if geno = FALSE) desired in output

geno

If TRUE (default) returns matrix of genotypes (values in 0,1,2 if haps is binary, otherwise double whatever the range of values in haps is), otherwise returns matrix of haplotypes (half individuals, same values of input haps)

loci_on_cols

If TRUE, haps has loci on columns and individuals on rows; if FALSE (default), loci are on rows and individuals on columns. If haps is a BEDMatrix object, loci_on_cols is ignored (set automatically to TRUE internally).

indexes_loci

Vector of index ranges (start, end) of loci to simulate, to request to simulate only a subset of the loci in haps/bim. Default NULL simulates all loci in haps/bim.

indexes_chr_ends

Optional vector mapping each chromosome (index in vector) to the last index in the bim table; missing chromosomes must be mapped to NA values. If NULL (default), this vector is calculated from the provided bim table. However, if this function is called repeatedly on the same bim data, it can improve efficiency to calculate it outside just once (use indexes_chr() for this).

Value

A matrix with the same number of rows as haps and n_ind columns, with values copied from haps in (recombination) blocks if geno = FALSE, or sums of two such values drawn independently when geno = TRUE.

Examples

# simulate a tiny population with few SNPs for example
library(tibble)
bim <- tibble(
  chr = c(2, 2, 3, 3, 22),
  pos = c(100, 121, 53, 154, 66) * 1e6
)
m_loci <- nrow( bim )
# Most often, haplotypes are binary data as simulated here.
# Here haplotypes will be totally unstructured, but to have LD in the output use real human data
# or data simulated to have LD
n_ind_haps <- 5
haps <- matrix(
  rbinom( m_loci * n_ind_haps, 1, 0.5 ),
  nrow = m_loci,
  ncol = n_ind_haps
)
# makes sense to have a lot of recombination at the population level
G <- 500
# ask for small output for example
n_ind <- 7
# use the recombination map for the same genome build as your data!
map <- recomb_map_hg38

# simulate genotypes!  (Usually more convenient, but phase information is lost)
X <- pop_recomb( haps, bim, map, G, n_ind )

# simulate haplotypes instead (preserves true phase)
H <- pop_recomb( haps, bim, map, G, n_ind, geno = FALSE )


OchoaLab/simfam documentation built on Jan. 11, 2025, 12:23 a.m.