family_sim_qtl: Simulate quantitative trait from simulated families

View source: R/family_sim_qtl.R

family_sim_qtlR Documentation

Simulate quantitative trait from simulated families

Description

This function is designed to pair with the family_sim_genos function to simulate a quantitative trait from a set of simulated families. However, it could also be used to simulate a quantitative trait from observed data. The function is currently only designed to simulate a single trait controlled by n loci with phenotypic variance, Vp, the sum of the additive, Va, and environmental, Ve, variances.

Usage

family_sim_qtl(
  famGenos,
  numLoci = NULL,
  qtlLoci = NULL,
  additiveVar,
  environVar
)

Arguments

famGenos

Data.table: Long-format, must contain the columns $SAMPLE of sample IDs, $LOCUS of locus IDs, and $GT of genotypes.

numLoci

Integer: The number of loci contributing to the trait. These loci are drawn from random in the dataset, famGenos. Therefore, it is essential the numLoci is <= the total unique loci in famGenos. Alternate parameterisation is to specify the names of the loci to use with qtlLoci.

qtlLoci

Character: A character vector of loci contributing to the trait. These loci must be present in the dataset, famGenos. Alternate parameterisation is to specify the number of loci to use with numLoci.

additiveVar

Numeric: The additive genetic variance.

environVar

Numeric: The environmental variance.

Details

Note, the current implementation of this function generates a specific additive + environmental variance for a trait, given a set of allele frequencies and the number/identify of loci of interest. The phenotypes and genotypic values simulated are only relevant to the input sample only.

A total of n == numLoci loci from famGenos are drawn at random to underpin the quantitative trait. It is assumed that all loci contribute equally to the trait, i.e., Va/n. The simulation starts by first fitting genotype AA with a genetic value of +1, Aa with a value of 0, and aa with a value of -1. Because the additive genetic variance is also dependent on alelle frequencies:

Va = 2pq[a + d(q - p)]^2

The genetic values are then modified relative to the allele frequencies to ensure the total additivie genetic variance in the sample sum to the specified Va. The phenotype per individual is the sum of their genotypic values plus randomly drawn environmental deviation (rnorm(.., mean=0, sd=sqrt(Ve))).

Value

A list with the indexes $trait and $loci is returned, containing information on the traits and the underpinning loci.

The index $trait contains a data.table with the following columns:

  1. $SAMPLE, the sample IDs.

  2. $G, the additive genetic values, with variance Va.

  3. $E, the environmental values, with variance Ve.

  4. $P, the phenoypic values, G + E.

The index $loci contains a data.table with the following columns:

  1. $LOCUS, the locus IDs.

  2. $FREQ, the frequency of the focal allele.

  3. $A.VAL, the additive genetic value for this locus.

References

Falconer and MacKay (1996) Introduction to Quantitative Genetics, 3rd Ed., chapter 8, page 129.

Examples

library(genomalicious)
data(data_Genos)

# Subset Pop1 genotypes
genosPop1 <- data_Genos[POP=='Pop1', c('SAMPLE', 'LOCUS', 'GT')]

# Get the allele frequencies for Pop1
freqsPop1 <- genosPop1[, .(FREQ=sum(GT)/(length(GT)*2)), by=LOCUS]

# Simulate 100 familial relationships of each class
simFamily <- family_sim_genos(
   freqData=freqsPop1,
   locusCol='LOCUS',
   freqCol='FREQ',
   numSims=100,
   returnParents=TRUE,
   returnPedigree=TRUE
)

# Take a look at the focal pairs
simFamily$focal.pairs

# Take a look at the parentals
simFamily$parents

# Take a look at the pedigree
simFamily$pedigree

### THE OBSERVED GENOMIC RELATIONSHIPS MATRIX
library(AGHmatrix)

# A genotype matrix for the focal pairs
obsGenosMat <- genosPop1 %>% DT2Mat_genos()

# Calculate the GRM
obsGRM <- Gmatrix(obsGenosMat, method='Yang', ploidy=2)

### THE SIMULATED GENOMIC RELATIONSHIPS MATRIX
# Convert simulated families into a genotype matrix
simGenosMat <- DT2Mat_genos(simFamily$focal.pairs)

# Calculate the GRM
simGRM <- Gmatrix(simGenosMat, method='Yang', ploidy=2)

### COMPARE THE OBSERVED AND SIMULATED
relComp <- family_sim_compare(
   simGRM=simGRM,
   obsGRM=obsGRM,
   look='classic'
)

# The data
relComp$data

# Simulated dataset
relComp$data[!is.na(SIM)]

# The observed dataset
relComp$data[is.na(SIM)]

# Plot of relatedness values. Dashed lines denote relatedness
# values of 0, 0.0625, 0.125, 0.25, and 0.5, which are the theoretical
# expectations for unrelated individuals, half-cousins, cousins, half-siblings,
# and siblings/parent-offspring, respectively.
# You will note a large variance in the expected values, which
# is not surprising for this very small SNP dataset (200 loci).
relComp$plot

### SIMULATE A QUANTITATIVE TRAIT

# Combine the focal pairs and parentals, and simualte a trait controlled by
# 100 loci with Va = 1, and Ve = 1.
simQTL <- family_sim_qtl(
  famGenos=rbind(simFamily$focal.pairs, simFamily$parents),
  numLoci=100, additiveVar=1, environVar=1
  )

# The trait values
simQTL$trait

# The locus values
simQTL$loci


j-a-thia/genomalicious documentation built on Oct. 19, 2024, 7:51 p.m.