KLFDAPC for inference of population structue of Regmap data

In this vignette, we demonstrate KLFDAPC for visualizing populartion structure of Regmap dataset (Arabidopsis thaliana) (Horton,et al., 2012). The reduced features of KLFDAPC retrive the geographic origins of the Arabidopsis thaliana. This is also a tpyical example for recapitulating the geographic origin of individuals using KLFDAPC.

knitr::opts_chunk$set(dev = "png", dev.args = list(type = "cairo-png"),collapse = TRUE,comment = "#",fig.width = 6,  fig.height = 5,fig.align = "center",
                      fig.cap = " ",results = "hold")

Loading packages into your R environment.

library(KLFDAPC)

library(PCAviz)

library(ggplot2)

library(cowplot)

Load the data

The dataset is already pre-processed data containing the computed PCs of RegMap data. We now load the data and compute the kernel local genetic features.

Load the data and print a summary of the PCs of RegMap data.

data(regmappcs)

### we have to remove the not well represented individuals, which one country only has one individuals.
table(regmappcs$country)

regmapviz=pcaviz(dat = regmappcs)

summary(regmapviz)

### remove unrepresented individuals
regmapviz <- subset(regmapviz,!(country == "AZE" | country == "CPV"| country == "DEN"| country == "GEO"| country == "IND"| country == "LIB"| country == "NOR"| country == "NZL")) 

### 25 countries
regmapviz$data$country=factor(regmapviz$data$country,levels = unique(regmapviz$data$country))

regmapviz1=pcaviz(dat = regmapviz$data)

Kernel Local Fisher Discriminant Analysis of Principal Components

### The first 10 PCs, we will produce 3 genetic features for visualization
normalize <- function(x) {
return ((x - min(x)) / (max(x) - min(x)))
}


pcanorm=apply(regmapviz1$data[,12:21], 2, normalize)

reg_kmat <- kmatrixGauss(pcanorm,sigma=10)

reg_klfdapc=KLFDA(reg_kmat, y=regmapviz1$data$country, r=3, knn = 1)

Visualization of the population structure

We can project the the first two genetic features onto the plane, with the samples labeled by the collecting country.

``````r

regmapviz1$data$PC1=reg_klfdapc$Z[,1]

regmapviz1$data$PC2=reg_klfdapc$Z[,2]

plot(regmapviz1,coord=c("PC1","PC2"),group =NULL,draw.points =FALSE,label = "country",plot.title=" KLFDAPC 1 vs. KLFDAPC 2")+xlab("KLFDAPC 1")+ylab("KLFDAPC 2")

### Genetic gradients along longitide and latitude


We would like to see how are the genetic gradients related to geography by looking at the relationships between geography (longitude, latitude) and the genetic gradients.

```r
### Plot the KLFDAPC 2 vs. long
plot1 <- plot(regmapviz1,coord=c("PC1","longitude"),draw.points = TRUE,color = "longitude",group = NULL,plot.title=" KLFDAPC 1 vs. long")+xlab("KLFDAPC 1")+ylab("long")
### Plot the KLFDAPC 2 vs. lat
plot2 <- plot(regmapviz1,coord=c("PC2","latitude"),draw.points = TRUE,color = "latitude",group = NULL,plot.title=" KLFDAPC 2 vs. lat")+xlab("KLFDAPC 1")+ylab("lat")

plot_grid(plot1,plot2,labels = c("A","B"))

Reference

Horton, M. W., Hancock, A. M., Huang, Y. S., Toomajian, C., Atwell, S., Auton, A., ... & Nordborg, M. (2012). Genome-wide patterns of genetic variation in worldwide Arabidopsis thaliana accessions from the RegMap panel. Nature genetics, 44(2), 212-216.

Qin, X. 2020. KLFDAPC: Kernel Local Fisher Discriminant Analysis of Principal Components (KLFDAPC) for large genomic data. R package version 0.2.0.

Qin, X., Chiang, C.W.K., and Gaggiotti, O.E. (2021). Kernel Local Fisher Discriminant Analysis of Principal Components (KLFDAPC) significantly improves the accuracy of predicting geographic origin of individuals. bioRxiv, 2021.2005.2015.444294.



xinghuq/KLFDAPC documentation built on June 12, 2022, 7:20 p.m.