Introduction

This Vignette shows an example usecase for generating a linkage map for polyploid data in R.

Installation

First, we need to install the package pergola. Second, we set a seed to ensure that this vignette is reproducable.

library(pergola)
set.seed(31415)

Next, we load the example dataset from the package.

data("simTetra")

simTetra is a simulated tetraploid backcross dataset, consisting of seven chromosomes. A small subset reveals the structure of the dataset. Four columns represent a sample (e.g. P1, P2, F1...) and each row represents a marker. The values are 0 and 1, which stands for A and B allele in that case.

simTetra[1:5, 1:12]

Data manipulation

As the data is simulated, the markers are grouped according to their chromosomes and ordered according to linkage. Our goal is to predict these. Thus, we want to randomize the order of the markers. The dataset contains the order of the alleles within the samples (haplotypes). This should also be randomized, because it provides information about the exact number of recombinations, which is approximated later.

simTetraGen <- shuffleInput(simTetra, ploidy = 4)
simTetraGen[1:5, 1:12]

Now that the data is randomized, we can treat it like real data. In the next step we collapse four columns into one. That way we have one column per sample without loosing information.

simTetraGen <- basesToGenotypes(simTetraGen, ploidy = 4)
simTetraGen[1:5, 1:6]

Finally, the data is ready to be analyzed.

Analysis

Calculate pairwise recombination frequencies

Now, we want to compare all markers with each other. This allows us to group similar markers together and define linkage groups. The first step is the calculation of pairwise recombination frequencies. Unlike other tools we do not use the expectation–maximization (EM) algorithm or maximum-likelihood (ML) methods. Instead, we approximate the recombination events and always expect the minimum number.

rf <- calcRec(simTetraGen, ploidy = 4)

rf is the matrix containing the recombination frequencies for all pairs of markers.

image(rf, xaxt = 'n', yaxt = 'n')
axis(side = 1, at = seq(0, 1, length.out = nrow(rf)),
     labels = rownames(rf), las = 2, cex.axis = 0.8)
axis(side = 2, at = seq(0, 1, length.out = nrow(rf)),
     labels = rownames(rf), las = 2, cex.axis = 0.8) 

The colors represent the different frequencies (yellow = high, red = low). We do not see any structure in the data set, except the diagonal. It is red because each there is no recombination between a marker and itself.

Linkage grouping

Next, we want to find out which marker belongs into which linkage group. Usually the number of linkage groups is equal to the number of chromosomes. In the beginning of this vignette, we mentioned that our dataset consist of seven chromosomes. Now it is time to see if that can be observed from the data. We plot a dendogram, the default type of plot in the funtion splitChr.

plotRf(rf)

The plot shows seven easily distinguishable clusters, as expected. The other implemented visualization is an image of the recombination frequency values:

plotRf(rf, plottype = "image")

Here we see seven rectangles and assume seven linkage groups. Hence, we split the data with splitChr.

split <- splitChr(rf, nchr = 7)
table(split$split)
head(split)

table() shows us how many markers are on each chromosome.

Additional parameters

In case that single markers end up in own linkage groups, one might want to filter them out with the parameter filter = TRUE. If many markers are highly similar, removing those duplicates is adviced. They have no added value for the linkage map and end up at the same position.

Marker ordering

Now, that we grouped our markers, we want to find the correct order of markers within the linkage groups. Filtered markers (chromosome 0) would be ignored.

split <- sortLeafs(rf, split)
head(split)

split$order is the global order of markers. We can visualize it with image().

image(rf[split$order, split$order], xaxt = 'n', yaxt = 'n')
axis(side = 1, at = seq(0, 1, length.out = nrow(rf)),
     labels = rownames(rf)[split$order], las = 2, cex.axis = 0.8)
axis(side = 2, at = seq(0, 1, length.out = nrow(rf)),
     labels = rownames(rf)[split$order], las = 2, cex.axis = 0.8) 

We see that the order within the chromosomes has improved because the red values moved closer to the diagonal.

Example of ambiguous orders

Here we show an example where two orders are equally good when only one neighbor is included.

set.seed(3)
ambRF <- cbind(c(0, 2, 4, 6, 8, 12),
               c(2, 0, 4, 4, 7, 10),
               c(4, 4, 0, 2, 4, 7),
               c(6, 4, 2, 0, 4, 5),
               c(8, 7, 4, 4, 0, 3),
               c(12, 10, 7, 5, 3, 0)) / 100
ambsplit <- data.frame(names = LETTERS[1:6],
                    split = rep(1, 6),
                    order = 1:6)
amb1 <- sortLeafs(ambRF, ambsplit)
amb1$order
amb2 <- c(1,2,4,3,5,6)
amb3 <- c(2,1,3,4,5,6)
amb4 <- 6:1 #reverse of amb1

calcSarf(ambRF, amb1$order, n = 1)
calcSarf(ambRF, amb2, n = 1)
calcSarf(ambRF, amb3, n = 1)
calcSarf(ambRF, amb4, n = 1)

calcSarf(ambRF, amb1$order, n = 2)
calcSarf(ambRF, amb2, n = 2)
calcSarf(ambRF, amb3, n = 2)
calcSarf(ambRF, amb4, n = 2)

Marker spacing

Finally, we create the map:

maps <- pullMap(rf, split)

We get a list with one object per chromosome. We visualize it with plotChr().

plotChr(maps[[1]], cex = 0.6)

We can also compare two chromosome maps:

maps2 <- pullMap(rf, split, fun = "kosambi")
plotChr(maps[[1]], maps2[[1]], cex = 0.6)

The map based on the Kosambi function is smaller than the default Haldane function.

Map comparison

To compare two maps on global level we use the package dendextend.

library(dendextend)
library(gclus)

We create a new map from the existing map maps2 because otherwise the two dendograms would be equal.

maps3 <- maps2
maps3[1] <- maps2[2]
maps3[2] <- maps2[1]
maps3[3] <- maps2[7]
maps3[[4]] <- rev(max(maps3[[4]]) - maps3[[4]])
maps3[6] <- maps2[3]
maps3[7] <- maps2[6]
maps3[[4]]
maps2[[4]]

We create dendogram objects from our previously created maps.

dend1 <- mapToDend(maps)
plot(dend1, cex = 0.6)

One is plotted to show what it looks like.

dend2 <- mapToDend(maps3)

maketangle creates a tanglegram.

makeTangle(dend1, dend2, cutheight = 500, k = 7)

We can easily observe the rearrangements that we did before. In a real data example we would not want these. We use switchChrs and swapChrs to reverse them. One map serves as model while the other one is changed.

maps <- switchChrs(map = maps, comp = maps3)
maps <- swapChrs(map = maps, comp = maps3)
dend3 <- mapToDend(maps)
dend4 <- mapToDend(maps3)
makeTangle(dend3, dend4, cutheight = 500, k = 7)

We see that both maps are now perfectly aligned. However, the interchromosomal order is not changed.

Acknowledgement

I thank Jeroen Lodewijk for testing the package and providing valuable feedback about its usablity and this vignette.

Funding

This work was supported by the European Union's Seventh Framework Programme for research, technological development and demonstration under grant agreement No. 289974. INTERCROSSING

Session Info

sessionInfo()


grafab/pergola documentation built on May 17, 2019, 8:18 a.m.