knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

The 'bwardr' package includes two genetic map files for the International Wheat Genome Sequencing Consortium's (IWGSC) v1.0 reference wheat genome. These genetic maps were created by the labs of Jesse Poland and Gary Muehlbauer, by aligning SNPs generated from the Synthetic x Opata cross using a PstI-MspI genotyping-by-sequencing (GBS) protocol against the reference genome. Note that these maps include the raw physical and genetic positions. This may cause problems in some applications, because the maps are not monotonic (i.e. genetic positions do not always uniformly increase with physical position). One way to address this is to fit a monotonic polynomial regression to the data for each chromosome, and then replace the raw genetic positions with the fitted values.

Packages and Data

We will require the synop_dh92 data from package 'bwardr', as well as the package 'MonoPoly' to perform the monotonic regression. Note that the Synthetic x Opata population of 906 recombinant inbred lines (synop_ril906) map has fewer markers, but is generally "smoother" than the smaller population of DH lines that we use here.

library(bwardr)
library(MonoPoly)
data("synop_dh92")
str(synop_dh92)

Raw data plot

Below is a plot of the raw genetic map for chromosome 1B.

chr1b <- synop_dh92[synop_dh92$chr == "1B", ]
plot(chr1b$dist ~ chr1b$pos, 
     xlab = "Physical Position (bp)", 
     ylab = "Genetic Position (cM)")

Note that there are a few "knots" in the plot, where the genetic distance decreases from one marker to the next.

Monotonic regression

We can split the map by chromosome, and then run the monotonic regression for each chromosome. The degree argument of the monpol() function controls the degree of the fit polynomial. Here we use a degree of 9, which appears to suitably capture the curvature. The individual chromosomes are recombined, and the "dist" column (genetic distance) is replaced by the fitted values.

maplist <- split(synop_dh92, synop_dh92$chr)
for (i in names(maplist)) {
  monreg <- monpol(dist ~ pos, data = maplist[[i]], degree = 9)
  maplist[[i]]$fitted <- as.vector(predict(monreg, newdata = maplist[[i]], scale = "original"))
}

mono_map <- do.call("rbind", maplist)
mono_map$dist <- round(mono_map$fitted, 2)
mono_map$fitted <- NULL

Fitted values plot

Below is a plot of the physical position vs. the fitted genetic position values for chromosome 1B:

chr1b <- mono_map[mono_map$chr == "1B", ]
plot(chr1b$dist ~ chr1b$pos, 
     xlab = "Physical Position (bp)", 
     ylab = "Genetic Position (cM)")

Note that the genetic position of the fitted data now uniformly increases as the physical position increases.

Negative values

Finally, some genetic distances for markers near the beginning of a chromosome may be negative depending on the fit of the regression. Below we count the number of markers with negative genetic distances on each chromosome, and then set the distances to 0. Chromosome 7A has the greatest number of markers with negative distances, though they still are a relatively small overall proportion of the total.

neg_cm <- mono_map[mono_map$dist < 0, ]
table(neg_cm$chr)
mono_map$dist[mono_map$dist < 0] <- 0


etnite/bwardr documentation built on Jan. 6, 2023, 7:12 a.m.