This Vignette shows an example usecase for generating a linkage map for polyploid data in R.
First, we need to install the package
Second, we set a seed, to ensure, that this vignette is reproducable.
Next, we load the example dataset from the package.
simTetra is a simulated tetraploid backcross dataset, consisting of seven chromosomes.
A small subset reveals the structure of the dataset.
Four columns represent a sample (e.g. P1, P2, F1...) and each row represents a marker.
The values are 0 and 1, which stands for A and B allele in that case.
As the data is simulated, the markers are grouped according to their chromosomes and ordered according to linkage. Our goal is to predict these. Thus, we want to randomize the order of the markers. The dataset contains the order of the alleles within the samples (haplotypes). This should also be randomized, because it provides information about the exact number of recombinations, which is approximated later.
simTetraGen <- shuffleInput(simTetra, ploidy = 4) simTetraGen[1:5, 1:12]
Now that the data is randomized, we can treat it like real data. In the next step we collapse four columns into one. That way we have one column per sample, without loosing information.
simTetraGen <- bases2genotypes(simTetraGen, ploidy = 4) simTetraGen[1:5, 1:6]
Finally, the data is ready to be analyzed.
Now, we want to compare all markers with each other. This allows us to group similar markers together and define linkage groups. The first step is the calculation of pairwise recombination frequencies. Unlike other tools we do not use the expectation–maximization (EM) algorithm or maximum-likelihood (ML) methods. Instead, we approximate the recombination events and always expect the minimum number.
rf <- calcRec(simTetraGen, ploidy = 4)
rf is the matrix containing the recombination frequencies for all pairs of markers.
image(rf, xaxt = 'n', yaxt = 'n') axis(side = 1, at = seq(0, 1, length.out = nrow(rf)), labels = rownames(rf), las = 2, cex.axis = 0.8) axis(side = 2, at = seq(0, 1, length.out = nrow(rf)), labels = rownames(rf), las = 2, cex.axis = 0.8)
The colors represent the different frequencies (yellow = high, red = low). We do not see any structure in the data set, except the diagonal. It is red, because each there is no recombination between a marker and itself.
Next, we want to find out which marker belongs into which linkage group.
Usually the number of linkage groups is equal to the number of chromosomes.
In the beginning of this vignette, we mentioned that our dataset consist of seven chromosomes.
Now it is time to see, if that can be observed from the data.
First, we plot a dendogram, the default type of plot in the funtion
The plot shows seven easily distinguishable clusters, as expected. The other implemented visualization is an image of the recombination frequency values:
plotRf(rf, plottype = "image")
Here we see seven rectangles and assume seven linkage groups.
Hence, we split the data with
split <- splitChr(rf, nchr = 7) table(split$split) head(split)
table() shows us, how many markers are on each chromosome.
In case that single markers end up in own linkage groups, one might want to filter them out with the parameter
filter = TRUE.
If many markers are highly similar, we removing those duplicates is adviced.
They have no added value for the linkage map and end up at the same position.
Now, that we grouped our markers, we want to find the correct order of markers, within the linkage groups. Filtered markers (chromosome 0) would be ignored.
split <- sortLeafs(rf, split) head(split)
split$order is the global order of markers.
We can visualize it with
image(rf[split$order, split$order], xaxt = 'n', yaxt = 'n') axis(side = 1, at = seq(0, 1, length.out = nrow(rf)), labels = rownames(rf)[split$order], las = 2, cex.axis = 0.8) axis(side = 2, at = seq(0, 1, length.out = nrow(rf)), labels = rownames(rf)[split$order], las = 2, cex.axis = 0.8)
We see, that the order within the chromosomes has improved, because the red values moved closer to the diagonal.
Here we show an example, where two orders are equally good when only one neighbor is included.
set.seed(3) ambRF <- cbind(c(0, 2, 4, 6, 8, 12), c(2, 0, 4, 4, 7, 10), c(4, 4, 0, 2, 4, 7), c(6, 4, 2, 0, 4, 5), c(8, 7, 4, 4, 0, 3), c(12, 10, 7, 5, 3, 0)) / 100 ambsplit <- data.frame(names = LETTERS[1:6], split = rep(1, 6), order = 1:6) amb1 <- sortLeafs(ambRF, ambsplit) amb1$order amb2 <- c(1,2,4,3,5,6) amb3 <- c(2,1,3,4,5,6) amb4 <- 6:1 #reverse of amb1 calcSarf(ambRF, amb1$order, n = 1) calcSarf(ambRF, amb2, n = 1) calcSarf(ambRF, amb3, n = 1) calcSarf(ambRF, amb4, n = 1) calcSarf(ambRF, amb1$order, n = 2) calcSarf(ambRF, amb2, n = 2) calcSarf(ambRF, amb3, n = 2) calcSarf(ambRF, amb4, n = 2)
Finally, we create the map:
maps <- pullMap(rf, split)
We get a list, with one object per chromosome.
We visualize it with
plotChr(maps[], cex = 0.6)
We can also compare two chromosome maps:
maps2 <- pullMap(rf, split, fun = "kosambi") plotChr(maps[], maps2[], cex = 0.6)
The map, based on the Kosambi function is smaller, than the default Haldane function.
To compare two maps on global level we use the package
We create a new map from the existing map
maps2, because otherwise the two dendograms would be equal.
maps3 <- maps2 maps3 <- maps2 maps3 <- maps2 maps3 <- maps2 maps3[] <- rev(max(maps3[]) - maps3[]) maps3 <- maps2 maps3 <- maps2 maps3[] maps2[]
We create dendogram objects from our previously created maps.
dend1 <- map2dend(maps) plot(dend1, cex = 0.6)
One is plotted, to show what it looks like.
dend2 <- map2dend(maps3)
maketangle creates a tanglegram.
maketangle(dend1, dend2, cutheight = 500, k = 7)
We can easily observe the rearrangements that we did before.
In a real data example we would not want these.
swapChrs to reverse them.
One map serves as model while the other one is changed.
maps <- switchChrs(map = maps, comp = maps3) maps <- swapChrs(map = maps, comp = maps3) dend3 <- map2dend(maps) dend4 <- map2dend(maps3) maketangle(dend3, dend4, cutheight = 500, k = 7)
We see, that both maps are now perfectly aligned. However, the interchromosomal order is not changed.
I thank Jeroen Lodewijk for testing the package and providing valuable feedback about its usablity and this vignette.
This work was supported by the European Union's Seventh Framework Programme for research, technological development and demonstration under grant agreement No. 289974. INTERCROSSING
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.