SNPClust is demonstrated here on a small subset of the human genome diversity project dataset: 157 European samples out of 940 and 5,000 SNPs out of 660,918.
library(snpclust)
The files are first converted to the snpgds format (cf. R package SNPRelate).
gds_path <- save_hgdp_as_gds()
SNPClust is then called on the GDS filepath.
snpclust_object <- snpclust(gds = gds_path, n_axes = 20)
file.remove(gds_path)
Details about the quality control of the dataset are stored in a data frame.
knitr::kable(snpclust_object$qc, 'markdown')
The results of principal component analysis (PCA) applied to the quality controlled dataset are stored in a long data frame. Here we see that samples are grouped by country of origin.
ggplot_pca(snpclust_object$pca, group = 'population', ellipses = TRUE) opticskxi::ggpairs(snpclust_object$pca, axes = 1:3, group = 'population') %>% plot opticskxi::ggpairs(snpclust_object$pca, axes = 4:6, group = 'population') %>% plot
For each prinicipal component, the absolute SNP contributions are displayed. SNPs are displayed by chromosome and position.
ggplot_manhat(pca = snpclust_object$pca, gdata = snpclust_object$gdata)
The Gaussian mixture models select SNPs above the background noise of other SNPs contributions. Here the selected SNPs are colored in red.
ggplot_selection(peaks = snpclust_object$peaks, pca = snpclust_object$pca)
When PCA is applied on the SNPClust selected dataset, samples are not grouped by geographic origin anymore.
ggplot_pca(pca = snpclust_object$features_pca, group = 'population', ellipses = TRUE) opticskxi::ggpairs(snpclust_object$features_pca, axes = 1:3, group = 'population') %>% plot opticskxi::ggpairs(snpclust_object$features_pca, axes = 4:6, group = 'population') %>% plot
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.