View source: R/family_sim_genos.R
family_sim_genos | R Documentation |
This function produces families of siblings, half-siblings, cousins and half-cousins from observed population allele frequencies. These simulations can then be used to test the power to discern between individuals with different levels of relatedness. Assumes diploid genotypes.
family_sim_genos(
freqData,
locusCol = "LOCUS",
freqCol = "FREQ",
numSims = 100L,
returnParents = FALSE,
returnPedigree = FALSE,
numCores = 1
)
freqData |
Data.table: Population allele frequencies. |
locusCol |
Character: Column name in |
freqCol |
Character: Column name in |
numSims |
Integer: The number of simulated individuals for each family relationship. Default is 100. |
returnParents |
Logical: Should the parental genotypes also be returned? Default is FALSE. |
returnPedigree |
Logical: Should the pedgree also be returned? Default is FALSE. |
numCores |
Integer: Number of cores to run. Default is 1, run on a single core. If >1, then will run as a parallel job. |
The output can be used to generate a simulated genetic relationship matrix (GRM)
that can be compared against an observed GRM using the function,
family_sim_compare
. This can provide a graphical comparison
between the observed and expected (simulated) distribution of relatedness
values that you might expect for different familial relationships, given the
number of loci and their allele frequencies.
In this function, a 'simulation' comprises a draw of 10 individuals: 2 are unrelated to each other and all other individuals in the simulation, 2 are a parent-offspring pair, 2 are siblings, 2 are half-siblings, 2 are cousins, and 2 are half-cousins.
The function always returns the genotypes for 'focal pairs' from a set of numSim
simulations. Each simulation takes the naming convention 'S[sim number]',
for example, S1, S2 ... S10 if 10 simulations are requested. The naming
convention for each simulated individual is 'S[sim number]_[relationship]',
with the 'relationship' tag being one of the following:
'unrel_ind_1' and 'unrel_ind_2' are two completely unrelated individuals.
'po_ind_1' and 'po_dam_1' are an offspring and its parent (mother).
'sib_ind_1' and 'sib_ind_2' are full siblings.
'halfsib_ind_1' are 'halfsib_ind_2' are half siblings.
'cous_ind_1' and 'cous_ind_2' are cousins.
'halfcous_ind_1' and 'halfcous_ind_2' are half cousins.
Note, that within the sth simulation, all pairs are unrelated to each other as well. For example, in simulation S1, sib_ind_1 and sib_ind_2 are completely unrelated to halfsib_ind_1 and halfsib_ind_2. Within an sth simulation there are 66 pairs, 5 of these are related (the focal parent-offspring, siblings, half-siblings, cousins, and half-cousins), and the rest are unrelated. Individuals from different simulation are also completely unrelated. For example, all individuals from simulation S1 are completely unrelated to all individuals from simulation S2.
If requested with returnParents==TRUE
, then the genotypes of all parental
indiviudals will also be returned for the parent-offspring, sibling, half-sibling,
cousin, and half-cousin focal pairs. This includes grandparents to generate the
cousin and half-cousin focal pairs.
If requested with returnPedigree==TRUE
, the pedigree will be returned.
Returns a list with indexes $focal.pairs
, $parents
, and
$pedigree
.
The $focal.pairs
index will always contain a data.table with the following columns:
$PLOIDY
, the ploidy for the locus.
$LOCUS
, the locus ID.
$FREQ
, the individual-level allele frequency.
$SIM
, the simulation number.
$SAMPLE
, the sample ID.
$FAMILY
, the familial relationship this individual was used to simualte.
$GT
, the genotype as counts of the Alt alleles.
If parent genotypes are requested, the $parents
index will contain
a data.table with the same structure as that in $focal.pairs
.
If the pedigree is requested, a data table is returned with structure:
$SAMPLE
, the sample ID.
$DAM
, the dam's ID.
$SIRE
, the sire's ID.
library(genomalicious)
data(data_Genos)
# Subset Pop1 genotypes
genosPop1 <- data_Genos[POP=='Pop1', c('SAMPLE', 'LOCUS', 'GT')]
# Get the allele frequencies for Pop1
freqsPop1 <- genosPop1[, .(FREQ=sum(GT)/(length(GT)*2)), by=LOCUS]
# Simulate 100 familial relationships of each class
simFamily <- family_sim_genos(
freqData=freqsPop1,
locusCol='LOCUS',
freqCol='FREQ',
numSims=100,
returnParents=TRUE,
returnPedigree=TRUE
)
# Take a look at the focal pairs
simFamily$focal.pairs
# Take a look at the parentals
simFamily$parents
# Take a look at the pedigree
simFamily$pedigree
### THE OBSERVED GENOMIC RELATIONSHIPS MATRIX
library(sommer)
# Note, for sommer, we have to adjust genotyeps to range from -1 to 1.
# A genotype matrix for the focal pairs
obsGenosMat <- genosPop1 %>%
copy %>%
.[, GT:=GT-1] %>%
DT2Mat_genos()
# Calculate the GRM
obsGRM <- sommer::A.mat(obsGenosMat)
### THE SIMULATED GENOMIC RELATIONSHIPS MATRIX
# Convert simulated families into a genotype matrix
simGenosMat <- simFamily$focal.pairs %>%
copy %>%
.[, GT:=GT-1] %>%
DT2Mat_genos()
# Calculate the GRM
simGRM <- sommer::A.mat(simGenosMat)
### COMPARE THE OBSERVED AND SIMULATED
relComp <- family_sim_compare(
simGRM=simGRM,
obsGRM=obsGRM,
look='classic'
)
# The data
relComp$data
# Simulated dataset
relComp$data[!is.na(SIM)]
# The observed dataset
relComp$data[is.na(SIM)]
# Plot of relatedness values. Dashed lines denote relatedness
# values of 0, 0.0625, 0.125, 0.25, and 0.5, which are the theoretical
# expectations for unrelated individuals, half-cousins, cousins, half-siblings,
# and siblings/parent-offspring, respectively.
# You will note a large variance in the expected values, which
# is not surprising for this very small SNP dataset (200 loci).
relComp$plot
### SIMULATE A QUANTITATIVE TRAIT
# Combine the focal pairs and parentals, and simualte a trait controlled by
# 100 loci with Va = 1, and Ve = 1, with equal effect sizes.
simQTL.equal <- family_sim_qtl(
famGenos=rbind(simFamily$focal.pairs, simFamily$parents),
numLoci=100, additiveVar=1, environVar=1, effectSizes=1
)
# The trait values for equal effect sizes.
simQTL.equal$trait
# The locus values to equal effect sizes.
simQTL.equal$loci
# This time, let's use vairable effect sizes that follow a gamma
# distribution of effects.
eff_sizes <- rgamma(5000, shape=0.5, scale=0.2)
# Fit with the same loci as used above.
simQTL.variable <- family_sim_qtl(
famGenos=rbind(simFamily$focal.pairs, simFamily$parents),
qtlNames=simQTL.equal$loci$LOCUS,
additiveVar=1, environVar=1, effectSizes=eff_sizes
)
# The trait values
simQTL.variable$trait
# The locus values
simQTL.variable$loci
# Compare the values
plot(simQTL.equal$trait$P, simQTL.variable$trait$P)
# But they both have the desired variances
var(simQTL.equal$trait$G)
var(simQTL.variable$trait$G)
var(simQTL.equal$trait$E)
var(simQTL.variable$trait$E)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.