View source: R/family_sim_compare.R
family_sim_compare | R Documentation |
This function can help compare observed estimates of relatedness to expected values of relatedness for different familial relationships given a set of loci and population alelle frequencies. It returns a data table combining the simulated and observed estiamtes and a plot of density curves overlaying the distribution of observed relatedness estimates on top of simulated values.
family_sim_compare(
simGRM,
obsGRM,
plotColous = NULL,
look = "ggplot",
legendPos = "right",
curveAlpha = 0.7,
curveFill = NULL,
curveOutline = NULL,
facetFamily = FALSE
)
simGRM |
Matrix: The simulated GRM using the individuals generated
from |
obsGRM |
Matrix: The observed GRM using the observed individuals. See Details. |
look |
Character: The look of the plot. Default = |
legendPos |
Character: Where should the legend be positioned? Default is
|
curveAlpha |
Numeric: A value between 0 and 1 to set the transparency of the density curves. Default = 0.7. |
curveFill |
Character: A vector of colours to fill density curves,
but is an optional argument. Default = |
curveOutline |
Character: A vector of colours to for density curve outlines,
but is an optional argument. Default = |
facetFamily |
Logical: Should the plot be faceted by family relationship? Default is FALSE. |
numSims |
Integer: The number of simulated individuals for each family relationship. Default is 100. |
The function takes the output of family_sim_genos
and additional calculations
of the genetic relationiship matrix (GRM) that are performed by the user on
the simulated and observed individuals.
The GRMs for arguments simGRM
and obsGRM
need to be
created by the user with whatever program they want to use to calculate
pairwise relatedness among individuals. The same function call should be used
ob both datasets. The GRM should be a square matrix with the relatedness of
individuals to themselves on the diagonal, and their relatedness to other
individuals on the off-diagonal.
Returns two objects. The first is data.table with the following columns:
$SIM
, the simulation number for pairs of simulated individuals,
or 'NA' for the pairs of observed individuals.
$SAMPLE1
, the sample ID for the first individual.
$SAMPLE2
, the sample ID for the second individual.
$FAMILY
, the familial relationship for simulated individuals.
$RELATE
, the estimated relatedness.
The second is a ggplot object which plots density curves for the estimated relatedness values calculated for simulated pairs of unrelated individuals, cousins, half-siblings, and siblings, with the observed relatedness values from the user's dataset overlayed. Dashed lines are used to demarcate the theoretical expected relatedness values for unrelated individuals (0), cousins (0.125), half-siblings (0.25), and siblings (0.5).
library(genomalicious)
data(data_Genos)
# Subset Pop1 genotypes
genosPop1 <- data_Genos[POP=='Pop1', c('SAMPLE', 'LOCUS', 'GT')]
# Get the allele frequencies for Pop1
freqsPop1 <- genosPop1[, .(FREQ=sum(GT)/(length(GT)*2)), by=LOCUS]
# Simulate 100 families
simFamily <- family_sim_genos(
freqData=freqsPop1,
locusCol='LOCUS',
freqCol='FREQ',
numSims=100
)
# Create some siblings in Pop1 from two sets of parents
parentList <- list(c('Ind1_1','Ind1_2'), c('Ind1_3','Ind1_4'))
genosSibs <- lapply(1:2, function(i){
parents <- parentList[[i]]
child <- paste(sub('Ind1_', '', parents), collapse='.')
gamete1 <- genosPop1[SAMPLE == parents[1]] %>%
.[, .(GAMETE=rbinom(n=2,size=1,prob=GT/2)), by=c('SAMPLE','LOCUS')] %>%
.[, SIB:=1:2, by=LOCUS]
gamete2 <- genosPop1[SAMPLE == parents[2]] %>%
.[, .(GAMETE=rbinom(n=2,size=1,prob=GT/2)), by=c('SAMPLE','LOCUS')] %>%
.[, SIB:=1:2, by=LOCUS]
rbind(gamete1, gamete2) %>%
.[, .(GT=sum(GAMETE)), by=c('LOCUS','SIB')] %>%
.[, SAMPLE:=paste0('Child_',child,'_',SIB)]
}) %>%
do.call('rbind', .)
### THE OBSERVED GENOMIC RELATIONSHIPS MATRIX
library(AGHmatrix)
# Combine the population samples and the created siblings
# into a single genotype matrix
obsGenosMat <- rbind(genosPop1, genosSibs[, c('SAMPLE','LOCUS','GT')]) %>%
DT2Mat_genos()
# Calculate the GRM
obsGRM <- Gmatrix(obsGenosMat, method='Yang', ploidy=2)
### THE SIMULATED GENOMIC RELATIONSHIPS MATRIX
# Convert simulated families into a genotype matrix
simGenosMat <- DT2Mat_genos(simFamily$focal.pairs)
# Calculate the GRM
simGRM <- Gmatrix(simGenosMat, method='Yang', ploidy=2)
### COMPARE THE OBSERVED AND SIMULATED
relComp <- family_sim_compare(
simGRM=simGRM,
obsGRM=obsGRM,
look='classic'
)
# The data
relComp$data
# Simulated dataset
relComp$data[!is.na(SIM)]
# The observed dataset
relComp$data[is.na(SIM)]
# Plot of relatedness values. Dashed lines denote relatedness
# values of 0, 0.0625, 0.125, 0.25, and 0.5, which are the theoretical
# expectations for unrelated individuals, half-cousins, cousins, half-siblings,
# and siblings, respectively.
# You will note a large variance are the expected values, which
# is not surprising for this very small SNP dataset (200 loci).
relComp$plot
# Take a look at the "known" relationships in the observed dataset using the
# offspring we created.
# Note, siblings and parent-offspring pairs have a theoretical
# relatedness of 0.5. But you will probably find the "observed"
# relatedness might be lower.
relComp$data[SAMPLE1=='Child_1.2_1' & SAMPLE2%in%c('Child_1.2_2','Ind1_1','Ind1_2')]
relComp$data[SAMPLE1=='Child_3.4_1' & SAMPLE2%in%c('Child_3.4_2','Ind1_3','Ind1_4')]
# Now take a look at the simulated distribution: the possible values for
# siblings given this dataset are quite wide.
relComp$data[FAMILY=='Half-siblings']$RELATE %>% summary()
relComp$data[FAMILY=='Siblings']$RELATE %>% summary()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.