sim_pedigree: Construct a random pedigree

View source: R/sim_pedigree.R

sim_pedigreeR Documentation

Construct a random pedigree

Description

Specify the number of individuals per generation, and some other optional parameters, and a single pedigree with those properties will be simulated, where close relatives are never paired, sex is drawn randomly per individual and pairings are strictly across opposite-sex individuals, and otherwise closest individuals (on an underlying 1D geography given by their index) are paired in a random order. Pairs are reordered based on the average of their indexes, where their children are placed (determines their indexes in the 1D geography). The procedure may leave some individuals unpaired in the next generation, and family sizes vary randomly (with a fixed minimum family size) to achieve the desired population size in each generation.

Usage

sim_pedigree(
  n,
  G = length(n),
  sex = draw_sex(n[1]),
  kinship_local = diag(n[1])/2,
  cutoff = 1/4^3,
  children_min = 1L,
  full = FALSE
)

Arguments

n

The number of individuals per generation. If scalar, the number of generations G >= 2 must also be specified. Otherwise, the length of n is the number of generations.

G

The number of generations (optional). Note G == 1 is founders only, so it is invalid (there is no pedigree). Must specify a G >= 2 if n is a scalar. If both G is specified and length(n) > 1, both values must agree.

sex

The numeric sex values for the founders (1L for male, 2L for female). By default they are drawn randomly using draw_sex().

kinship_local

The local kinship matrix of the founder population. The default value is half the identity matrix, which corresponds to locally unrelated and locally outbred founders. This "local" kinship is the basis for all kinship calculations used to decide on close relative avoidance. The goal is to make a decision to not pair close relatives based on the pedigree only (and not based on population structure, which otherwise increases all kinship values), so the default value is appropriate.

cutoff

Local kinship values strictly less than cutoff are required for pairs. The default value of 1/4^3 corresponds to second cousins, so those and closer relatives are forbidden pairs (but a third cousin pair is allowed).

children_min

The minimum number of children per family. Must be 0 or larger, but not exceed the average number of children per family in each generation (varies depending on how many individuals were left unpaired, but this upper limit is approximately 2 * n[i] / n[i-1] for generation i). The number of children for each given family is first chosen as children_min plus a Poisson random variable with parameter equal to the mean number of children per family needed to achieve the desired population size (n) minus children_min. As these numbers may not exactly equal the target population size, random families are incremented or decremented (respecting the minimum family size) by single counts until the target population size is met.

full

If TRUE, part of the return object is a list of local kinship matrices for every generation. If FALSE (default), only the local kinship matrix of the last generation is returned.

Value

A list with these named elements:

  • fam: the pedigree, a tibble in plink FAM format. Following the column naming convention of the related genio package, it contains columns:

    • fam: Family ID, trivial "fam1" for all individuals

    • id: Individual ID, in this case a code of format (in regular expression) "(\d+)-(\d+)" where the first integer is the generation number and the second integer is the index number (1 to n[g] for generation g).

    • pat: Paternal ID. Matches an id except for founders, which have fathers set to NA.

    • mat: Maternal ID. Matches an id except for founders, which have mothers set to NA.

    • sex: integers 1L (male) or 2L (female) which were drawn randomly; no other values occur in these outputs.

    • pheno: Phenotype, here all 0 (missing value).

  • ids: a list of IDs for each generation (indexed in the list by generation).

  • kinship_local: if full = FALSE, the local kinship matrix of the last generation, otherwise a list of local kinship matrices for every generation.

See Also

Plink FAM format reference: https://www.cog-genomics.org/plink/1.9/formats#fam

Examples

# number of individuals for each generation
n <- c(15, 20, 25)

# create random pedigree with 3 generations, etc
data <- sim_pedigree( n )

# this is the FAM table defining the entire pedigree,
# which is the most important piece of information desired!
data$fam

# the IDs separated by generation
data$ids

# bonus: the local kinship matrix of the final generation
data$kinship_local


simfam documentation built on Jan. 10, 2023, 1:06 a.m.