View source: R/family_sim_qtl.R
family_sim_qtl | R Documentation |
This function is designed to pair with the family_sim_genos
function
to simulate a quantitative trait from a set of simulated families.
However, it could also be used to simulate a quantitative trait from observed data.
The function is currently only designed to simulate a single trait controlled
by n loci with phenotypic variance, Vp, the sum of the
additive, Va, and environmental, Ve, variances.
family_sim_qtl(
famGenos,
numLoci = NULL,
qtlNames = NULL,
additiveVar,
environVar,
effectSizes = 1
)
famGenos |
Data.table: Long-format, must contain the columns
|
numLoci |
Integer: The number of loci contributing to the trait. These loci
are drawn from random in the dataset, |
qtlNames |
Character: A character vector of loci contributing to the trait.
These loci must be present in the dataset, |
additiveVar |
Numeric: The additive genetic variance. |
environVar |
Numeric: The environmental variance. |
effectSizes |
Numeric: A vector of effect sizes (positive values only) to assign to AA homozygous genotype. Default = 1, that is, all QTLs will have the same effect size. |
Note, the current implementation of this function generates a specific additive + environmental variance for a trait, given a set of allele frequencies and the number/identify of loci of interest. The phenotypes and genotypic values simulated are only relevant to the input sample only.
A total of n == numLoci
loci from famGenos
are drawn
at random to underpin the quantitative trait. It is assumed that all loci contribute
equally to the trait, i.e., Va/n. The simulation starts by first
fitting genotype AA with a genetic value of +1, Aa with a value of 0,
and aa with a value of -1. Because the additive genetic variance
is also dependent on alelle frequencies:
Va = 2pq[a + d(q - p)]^2
The genetic values are then modified relative to the allele frequencies to ensure
the total additivie genetic variance in the sample sum to the specified Va.
The phenotype per individual is the sum of their genotypic values plus randomly
drawn environmental deviation (rnorm(.., mean=0, sd=sqrt(Ve))
).
When assigning variable effect sizes to argument effectSizes
, the user
needs to generate a vector of values. For example, using the rgamma
or
the rexp
functions. Note, these represent the homozygous AA genotype
effect sizes, so the aa genotype effect sizes will be the negative of these values.
A list with the indexes $trait
and $loci
is returned, containing
information on the traits and the underpinning loci.
The index $trait
contains a data.table with the following columns:
$SAMPLE
, the sample IDs.
$G
, the additive genetic values, with variance Va.
$E
, the environmental values, with variance Ve.
$P
, the phenoypic values, G + E.
The index $loci
contains a data.table with the following columns:
$LOCUS
, the locus IDs.
$FREQ
, the frequency of the focal allele.
$A.VAL
, the additive genetic value for this locus.
Falconer and MacKay (1996) Introduction to Quantitative Genetics, 3rd Ed., chapter 8, page 129.
library(genomalicious)
data(data_Genos)
# Subset Pop1 genotypes
genosPop1 <- data_Genos[POP=='Pop1', c('SAMPLE', 'LOCUS', 'GT')]
# Get the allele frequencies for Pop1
freqsPop1 <- genosPop1[, .(FREQ=sum(GT)/(length(GT)*2)), by=LOCUS]
# Simulate 100 familial relationships of each class
simFamily <- family_sim_genos(
freqData=freqsPop1,
locusCol='LOCUS',
freqCol='FREQ',
numSims=100,
returnParents=TRUE,
returnPedigree=TRUE
)
# Take a look at the focal pairs
simFamily$focal.pairs
# Take a look at the parentals
simFamily$parents
# Take a look at the pedigree
simFamily$pedigree
### THE OBSERVED GENOMIC RELATIONSHIPS MATRIX
library(sommer)
# Note, for sommer, we have to adjust genotyeps to range from -1 to 1.
# A genotype matrix for the focal pairs
obsGenosMat <- genosPop1 %>%
copy %>%
.[, GT:=GT-1] %>%
DT2Mat_genos()
# Calculate the GRM
obsGRM <- sommer::A.mat(obsGenosMat)
### THE SIMULATED GENOMIC RELATIONSHIPS MATRIX
# Convert simulated families into a genotype matrix
simGenosMat <- simFamily$focal.pairs %>%
copy %>%
.[, GT:=GT-1] %>%
DT2Mat_genos()
# Calculate the GRM
simGRM <- sommer::A.mat(simGenosMat)
### COMPARE THE OBSERVED AND SIMULATED
relComp <- family_sim_compare(
simGRM=simGRM,
obsGRM=obsGRM,
look='classic'
)
# The data
relComp$data
# Simulated dataset
relComp$data[!is.na(SIM)]
# The observed dataset
relComp$data[is.na(SIM)]
# Plot of relatedness values. Dashed lines denote relatedness
# values of 0, 0.0625, 0.125, 0.25, and 0.5, which are the theoretical
# expectations for unrelated individuals, half-cousins, cousins, half-siblings,
# and siblings/parent-offspring, respectively.
# You will note a large variance in the expected values, which
# is not surprising for this very small SNP dataset (200 loci).
relComp$plot
### SIMULATE A QUANTITATIVE TRAIT
# Combine the focal pairs and parentals, and simualte a trait controlled by
# 100 loci with Va = 1, and Ve = 1, with equal effect sizes.
simQTL.equal <- family_sim_qtl(
famGenos=rbind(simFamily$focal.pairs, simFamily$parents),
numLoci=100, additiveVar=1, environVar=1, effectSizes=1
)
# The trait values for equal effect sizes.
simQTL.equal$trait
# The locus values to equal effect sizes.
simQTL.equal$loci
# This time, let's use vairable effect sizes that follow a gamma
# distribution of effects.
eff_sizes <- rgamma(5000, shape=0.5, scale=0.2)
# Fit with the same loci as used above.
simQTL.variable <- family_sim_qtl(
famGenos=rbind(simFamily$focal.pairs, simFamily$parents),
qtlNames=simQTL.equal$loci$LOCUS,
additiveVar=1, environVar=1, effectSizes=eff_sizes
)
# The trait values
simQTL.variable$trait
# The locus values
simQTL.variable$loci
# Compare the values
plot(simQTL.equal$trait$P, simQTL.variable$trait$P)
# But they both have the desired variances
var(simQTL.equal$trait$G)
var(simQTL.variable$trait$G)
var(simQTL.equal$trait$E)
var(simQTL.variable$trait$E)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.