View source: R/family_sim_compare.R
family_sim_compare | R Documentation |
This function can help compare observed estimates of relatedness to expected values of relatedness for different familial relationships given a set of loci and population alelle frequencies. It returns a data table combining the simulated and observed estiamtes and a plot of density curves overlaying the distribution of observed relatedness estimates on top of simulated values.
family_sim_compare(
simGRM,
obsGRM,
plotColous = NULL,
look = "ggplot",
legendPos = "right",
curveAlpha = 0.7,
curveFill = NULL,
curveOutline = NULL,
facetFamily = FALSE
)
simGRM |
Matrix: The simulated GRM using the individuals generated
from |
obsGRM |
Matrix: The observed GRM using the observed individuals. See Details. |
look |
Character: The look of the plot. Default = |
legendPos |
Character: Where should the legend be positioned? Default is
|
curveAlpha |
Numeric: A value between 0 and 1 to set the transparency of the density curves. Default = 0.7. |
curveFill |
Character: A vector of colours to fill density curves,
but is an optional argument. Default = |
curveOutline |
Character: A vector of colours to for density curve outlines,
but is an optional argument. Default = |
facetFamily |
Logical: Should the plot be faceted by family relationship? Default is FALSE. |
numSims |
Integer: The number of simulated individuals for each family relationship. Default is 100. |
The function takes the output of family_sim_genos
and additional calculations
of the genetic relationiship matrix (GRM) that are performed by the user on
the simulated and observed individuals.
The GRMs for arguments simGRM
and obsGRM
need to be
created by the user with whatever program they want to use to calculate
pairwise relatedness among individuals. The same function call should be used
ob both datasets. The GRM should be a square matrix with the relatedness of
individuals to themselves on the diagonal, and their relatedness to other
individuals on the off-diagonal.
Returns two objects. The first is data.table with the following columns:
$SIM
, the simulation number for pairs of simulated individuals,
or 'NA' for the pairs of observed individuals.
$SAMPLE1
, the sample ID for the first individual.
$SAMPLE2
, the sample ID for the second individual.
$FAMILY
, the familial relationship for simulated individuals.
$RELATE
, the estimated relatedness.
The second is a ggplot object which plots density curves for the estimated relatedness values calculated for simulated pairs of unrelated individuals, cousins, half-siblings, and siblings, with the observed relatedness values from the user's dataset overlayed. Dashed lines are used to demarcate the theoretical expected relatedness values for unrelated individuals (0), cousins (0.125), half-siblings (0.25), and siblings (0.5).
library(genomalicious)
data(data_Genos)
# Subset Pop1 genotypes
genosPop1 <- data_Genos[POP=='Pop1', c('SAMPLE', 'LOCUS', 'GT')]
# Get the allele frequencies for Pop1
freqsPop1 <- genosPop1[, .(FREQ=sum(GT)/(length(GT)*2)), by=LOCUS]
# Simulate 100 families
simFamily <- family_sim_genos(
freqData=freqsPop1,
locusCol='LOCUS',
freqCol='FREQ',
numSims=100
)
### THE OBSERVED GENOMIC RELATIONSHIPS MATRIX
library(sommer)
# Note, for sommer, we have to adjust genotyeps to range from -1 to 1.
# A genotype matrix for the focal pairs
obsGenosMat <- genosPop1 %>%
copy %>%
.[, GT:=GT-1] %>%
DT2Mat_genos()
# Calculate the GRM
obsGRM <- sommer::A.mat(obsGenosMat)
### THE SIMULATED GENOMIC RELATIONSHIPS MATRIX
# Convert simulated families into a genotype matrix
simGenosMat <- simFamily$focal.pairs %>%
copy %>%
.[, GT:=GT-1] %>%
DT2Mat_genos()
# Calculate the GRM
simGRM <- sommer::A.mat(simGenosMat)
### COMPARE THE OBSERVED AND SIMULATED
relComp <- family_sim_compare(
simGRM=simGRM,
obsGRM=obsGRM,
look='classic'
)
# The data
relComp$data
# Simulated dataset
relComp$data[!is.na(SIM)]
# The observed dataset
relComp$data[is.na(SIM)]
# Plot of relatedness values. Dashed lines denote relatedness
# values of 0, 0.0625, 0.125, 0.25, and 0.5, which are the theoretical
# expectations for unrelated individuals, half-cousins, cousins, half-siblings,
# and siblings, respectively.
# You will note a large variance are the expected values, which
# is not surprising for this very small SNP dataset (200 loci).
relComp$plot
# Take a look at the "known" relationships in the observed dataset using the
# offspring we created.
# Note, whilst values are centered on their theoretical expectations,
# you will probably find the "observed" relatedness might be lower or even higher.
relComp$data[FAMILY=='Half-siblings']$RELATE %>% summary()
relComp$data[FAMILY=='Siblings']$RELATE %>% summary()
relComp$data[FAMILY=='Parent-offspring']$RELATE %>% summary()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.