knitr::opts_chunk$set(echo = FALSE, results = 'asis')
nHomDiff <- 15
p_orig <- p <- 0.08
p_incr <- 1.2 * p
q <- 1-p
a <- nHomDiff/2
d <- -1.5
alpha <- a + (q-p)*d
alpha_orig <- alpha

Problem 1: Breeding Value

We are considering a quantitative trait that depends on a given bi-allelic locus $G$. The frequency of the favorable allele corresponds to $r p$. Suppose that genotype frequencies follow the Hardy-Weinberg equilibrium. The difference between the homozygous genotypes corresponds to $r nHomDiff$. The heterozygous genotype has a value of $r d$.

a) Compute the breeding values and the dominance deviations for the three genotypes. b) Because of selecting the positive allele the frequency has increased to r p_incr. How does this increased allele frequency change the breeding values?

Hint: Have a look at the summary table of all values in the course notes.

Solution

According to the summary table of all values in the course notes, the breeding values depend on the term $\alpha$. Therefore, we start by computing $\alpha$ first.

$$\alpha = a + (q-p)d$$

Based on the problem description, we know that $a = r a$, $p = r p$, $q = 1-p = r q$ and $d = r d$. Therefore

$$\alpha = r a + (r q - r p)*(r d) = r alpha$$

a) The summary table of all values then looks as follows.

tbl_genotypic_decomp <- tibble::tibble(Genotype             = c("$G_1G_1$", "$G_1G_2$", "$G_2G_2$"),
                                       `Genotypic Value`    = c(paste0("$a = ", a, "$"), paste0("$d = ", d, "$"), paste0("$-a = ", -a, "$")),
                                       `Breeding Value`     = c(paste0("$2q\\alpha = ", 2*q*alpha, "$"), 
                                                                paste0("$(q-p)\\alpha = ", (q-p) * alpha, "$"), 
                                                                paste0("$-2p\\alpha = ", -2*p*alpha, "$")),
                                       `Dominance Deviation`= c(paste0("$-2q^2d = ",  -2*q^2*d, "$"), 
                                                                paste0("$2pqd = ", 2*p*q*d, "$"), 
                                                                paste0("$-2p^2d = ",  -2*p^2*d, "$")))
knitr::kable(tbl_genotypic_decomp,
             booktabs = TRUE,
             longtable = TRUE)
p <- p_incr
q <- 1-p
alpha <- a + (q-p)*d
alpha_incr <- alpha

b) Based on the change in the allele frequency to $p = r p$ and $q = r q$. The value of $\alpha$ changes to $\alpha = r alpha$. This has consequences for the whole summary table.

tbl_genotypic_decomp <- tibble::tibble(Genotype             = c("$G_1G_1$", "$G_1G_2$", "$G_2G_2$"),
                                       `Genotypic Value`    = c(paste0("$a = ", a, "$"), paste0("$d = ", d, "$"), paste0("$-a = ", -a, "$")),
                                       `Breeding Value`     = c(paste0("$2q\\alpha = ", 2*q*alpha, "$"), 
                                                                paste0("$(q-p)\\alpha = ", (q-p) * alpha, "$"), 
                                                                paste0("$-2p\\alpha = ", -2*p*alpha, "$")),
                                       `Dominance Deviation`= c(paste0("$-2q^2d = ",  -2*q^2*d, "$"), 
                                                                paste0("$2pqd = ", 2*p*q*d, "$"), 
                                                                paste0("$-2p^2d = ",  -2*p^2*d, "$")))
knitr::kable(tbl_genotypic_decomp,
             booktabs = TRUE,
             longtable = TRUE)

Due to the increment in the allele frequency $p$ from r p_orig to r p_incr the value of $\alpha$ got bigger. But the breeding values decreased, because the negative influence of incrementing $p$ on the breeding values was bigger than the positive change of $\alpha$.

Problem 2: Allele Substitution

What is the meaning of the term allele substituion an how big is it in 1a) and 1b)?

Solution

The effect of allele substitution occurs in the difference of the breeding values between two genotypes where on of these genotypes has one favorable allele more than the other. For a single bi-allelic locus there are two possible differences that fullfill the requirement from the previous sentence, namely $BV_{12} - BV{22}$ and $BV_{11} - BV_{12}$. The result of both differences is the same and corresponds to $\alpha = a + (p-q)*d$.

The allele substitution ($\alpha$) in 1a) corresponds to r alpha_orig in 1b) the value is r alpha_incr.

Problem 3: Reading Data into R

siris_url <- "https://charlotte-ngs.github.io/lbgfs2020/misc/iris_ex03.csv"

You can download a file in csv-format from the course website. The URL is r siris_url. Read the data from that csv-file into R using the function read.csv2(). Test the consequences of specifying the option stringsAsFactors=TRUE. The function read_csv2() from the readr package is an alterative way to import data from a .csv-file. The result is a little different. While the function read.csv2() returns an ordinary 'data.frame' as a result, the function read_csv2() returns a 'tibble' which is a more modern version of a 'data.frame'.

Hints:

  1. You can first download the csv-file to your local computer and then read the data, or you can directly indicate the URL when reading the data. You get more information with the command ?read.csv2 at the R-console.
  2. Assign the result of read.csv2() to a variable
  3. Use the function str() on the result of read.csv2() to see the difference between the two results of reading the data.
  4. Use the description at https://bookdown.org/rdpeng/rprogdatascience/getting-data-in-and-out-of-r.html as a reference to read data into R. There is also a video on the same subject under https://youtu.be/Z_dc_FADyi4.
  5. Use the function read_csv2() to import the data and inspect the difference between a 'data.frame' and a 'tibble'.

Solution

bOnline <- FALSE
sDataFn <- "iris_ex03.csv"
if (!bOnline & !file.exists(sDataFn))
  write.csv2(iris, file = sDataFn, row.names = FALSE)
dfIris1 <- read.csv2(file = "https://charlotte-ngs.github.io/lbgfs2020/misc/iris_ex03.csv")
str(dfIris1)
dfIris2 <- read.csv2(file = "https://charlotte-ngs.github.io/lbgfs2020/misc/iris_ex03.csv", 
                     stringsAsFactors = TRUE)
str(dfIris2)
dfIris1 <- read.csv2(file = "iris_ex03.csv")
str(dfIris1)
dfIris2 <- read.csv2(file = "iris_ex03.csv", stringsAsFactors = TRUE)
str(dfIris2)
s_iris_file <- "iris_ex03.csv"
if (bOnline) s_iris_file <- "https://charlotte-ngs.github.io/lbgfs2020/misc/iris_ex03.csv"
tblIris1 <- readr::read_csv2(file = s_iris_file)
str(tblIris1)

Additional Problem: Create a plot in R

Plot the values in the columns Sepal.Length and Petal.Length of the Iris data set. The plot should look like the following figure.

bIsSolution <- TRUE
if(!bIsSolution)
  plot(dfIris2$Sepal.Length, dfIris2$Petal.Length)

The above plot was produced using the standard plotting function of the base-R system. The R-package ggplot2 provides an intersting alternative to the basic plotting function. A plot with ggplot2 looks as follows.

bIsSolution <- TRUE
if(!bIsSolution)
  ggplot2::qplot(`Sepal.Length`, `Petal.Length`, data = dfIris2)

Solution

The plot with the base-R plotting function is produced with the following command.

plot(dfIris2$Sepal.Length, dfIris2$Petal.Length)

\pagebreak

The plot using ggplot2 functionality is created with the following statement.

ggplot2::qplot(`Sepal.Length`, `Petal.Length`, data = tblIris1)


charlotte-ngs/lbgfs2020 documentation built on Dec. 20, 2020, 5:39 p.m.