family_sim_genos: Simulate families of individuals from population allele...

View source: R/family_sim_genos.R

family_sim_genosR Documentation

Simulate families of individuals from population allele frequencies

Description

This function produces families of siblings, half-siblings, cousins and half-cousins from observed population allele frequencies. These simulations can then be used to test the power to discern between individuals with different levels of relatedness. Assumes diploid genotypes.

Usage

family_sim_genos(
  freqData,
  locusCol = "LOCUS",
  freqCol = "FREQ",
  numSims = 100L,
  returnParents = FALSE,
  returnPedigree = FALSE
)

Arguments

freqData

Data.table: Population allele frequencies.

locusCol

Character: Column name in freqData with the locus IDs. Default is 'LOCUS'.

freqCol

Character: Column name in freqData with the allele frequencies. Default is 'FREQ'.

numSims

Integer: The number of simulated individuals for each family relationship. Default is 100.

returnParents

Logical: Should the parental genotypes also be returned? Default is FALSE.

returnPedigree

Logical: Shoudl the pedgree also be returned? Default is FALSE.

Details

The output can be used to generate a simulated genetic relationship matrix (GRM) that can be compared against an observed GRM using the function, family_sim_compare. This can provide a graphical comparison between the observed and expected (simulated) distribution of relatedness values that you might expect for different familial relationships, given the number of loci and their allele frequencies.

In this function, a 'simulation' comprises a draw of 10 individuals: 2 are unrelated to each other and all other individuals in the simulation, 2 are a parent-offspring pair, 2 are siblings, 2 are half-siblings, 2 are cousins, and 2 are half-cousins.

The function always returns the genotypes for 'focal pairs' from a set of numSim simulations. Each simulation takes the naming convention 'S[sim number]', for example, S1, S2 ... S10 if 10 simulations are requested. The naming convention for each simulated individual is 'S[sim number]_[relationship]', with the 'relationship' tag being one of the following:

  • 'unrel_ind_1' and 'unrel_ind_2' are two completely unrelated individuals.

  • 'po_ind_1' and 'po_dam_1' are an offspring and its parent (mother).

  • 'sib_ind_1' and 'sib_ind_2' are full siblings.

  • 'halfsib_ind_1' are 'halfsib_ind_2' are half siblings.

  • 'cous_ind_1' and 'cous_ind_2' are cousins.

  • 'halfcous_ind_1' and 'halfcous_ind_2' are half cousins.

Note, that within the sth simulation, all pairs are unrelated to each other as well. For example, in simulation S1, sib_ind_1 and sib_ind_2 are completely unrelated to halfsib_ind_1 and halfsib_ind_2. Within an sth simulation there are 66 pairs, 5 of these are related (the focal parent-offspring, siblings, half-siblings, cousins, and half-cousins), and the rest are unrelated. Individuals from different simulation are also completely unrelated. For example, all individuals from simulation S1 are completely unrelated to all individuals from simulation S2.

If requested with returnParents==TRUE, then the genotypes of all parental indiviudals will also be returned for the parent-offspring, sibling, half-sibling, cousin, and half-cousin focal pairs. This includes grandparents to generate the cousin and half-cousin focal pairs.

If requested with returnPedigree==TRUE, the pedigree will be returned.

Value

Returns a list with indexes $focal.pairs, $parents, and $pedigree.

The $focal.pairs index will always contain a data.table with the following columns:

  1. $PLOIDY, the ploidy for the locus.

  2. $LOCUS, the locus ID.

  3. $FREQ, the individual-level allele frequency.

  4. $SIM, the simulation number.

  5. $SAMPLE, the sample ID.

  6. $FAMILY, the familial relationship this individual was used to simualte.

  7. $GT, the genotype as counts of the Alt alleles.

If parent genotypes are requested, the $parents index will contain a data.table with the same structure as that in $focal.pairs.

If the pedigree is requested, a data table is returned with structure:

  1. $SAMPLE, the sample ID.

  2. $DAM, the dam's ID.

  3. $SIRE, the sire's ID.

Examples

library(genomalicious)
data(data_Genos)

# Subset Pop1 genotypes
genosPop1 <- data_Genos[POP=='Pop1', c('SAMPLE', 'LOCUS', 'GT')]

# Get the allele frequencies for Pop1
freqsPop1 <- genosPop1[, .(FREQ=sum(GT)/(length(GT)*2)), by=LOCUS]

# Simulate 100 familial relationships of each class
simFamily <- family_sim_genos(
   freqData=freqsPop1,
   locusCol='LOCUS',
   freqCol='FREQ',
   numSims=100,
   returnParents=TRUE,
   returnPedigree=TRUE
)

# Take a look at the focal pairs
simFamily$focal.pairs

# Take a look at the parentals
simFamily$parents

# Take a look at the pedigree
simFamily$pedigree

### THE OBSERVED GENOMIC RELATIONSHIPS MATRIX
library(AGHmatrix)

# A genotype matrix for the focal pairs
obsGenosMat <- genosPop1 %>% DT2Mat_genos()

# Calculate the GRM
obsGRM <- Gmatrix(obsGenosMat, method='Yang', ploidy=2)

### THE SIMULATED GENOMIC RELATIONSHIPS MATRIX
# Convert simulated families into a genotype matrix
simGenosMat <- DT2Mat_genos(simFamily$focal.pairs)

# Calculate the GRM
simGRM <- Gmatrix(simGenosMat, method='Yang', ploidy=2)

### COMPARE THE OBSERVED AND SIMULATED
relComp <- family_sim_compare(
   simGRM=simGRM,
   obsGRM=obsGRM,
   look='classic'
)

# The data
relComp$data

# Simulated dataset
relComp$data[!is.na(SIM)]

# The observed dataset
relComp$data[is.na(SIM)]

# Plot of relatedness values. Dashed lines denote relatedness
# values of 0, 0.0625, 0.125, 0.25, and 0.5, which are the theoretical
# expectations for unrelated individuals, half-cousins, cousins, half-siblings,
# and siblings/parent-offspring, respectively.
# You will note a large variance in the expected values, which
# is not surprising for this very small SNP dataset (200 loci).
relComp$plot

### SIMULATE A QUANTITATIVE TRAIT

# Combine the focal pairs and parentals, and simualte a trait controlled by
# 100 loci with Va = 1, and Ve = 1.
simQTL <- family_sim_qtl(
  famGenos=rbind(simFamily$focal.pairs, simFamily$parents),
  numLoci=100, additiveVar=1, environVar=1
  )

# The trait values
simQTL$trait

# The locus values
simQTL$loci


j-a-thia/genomalicious documentation built on Oct. 19, 2024, 7:51 p.m.