KLFDAPT: Genome scan for detecting natural selection based on KLFDAPC

View source: R/KLFDAPT.R

KLFDAPTR Documentation

Genome scan for detecting natural selection based on KLFDAPC

Description

KLFDAPT implements genome scan based on KLFDAPC. The implementation is similar with PCA-based genome scan that reported by Patterson, N.,et al. 2006 and Luu, K., et al. 2017. The loci are weighted by the correlations between genetic variation and the genetic structure (first K reduced features).

Usage

KLFDAPT(klfdapc, gdsfile, estimation="auto", snp.id = NULL, n.RD = NULL, num.thread = 1L, with.id = TRUE, outgds = NULL, verbose = TRUE)

Arguments

klfdapc

KLFDAPC matrix. The results returned from KLFDAPC analysis.

gdsfile

The genotype file, it is an object of class SNPGDSFileClass, a SNP GDS file

estimation

a character string specifying the robust estimator to be used. The choices are: "mcd" for the Fast MCD algorithm of Rousseeuw and Van Driessen, "weighted" for the Reweighted MCD, "donostah" for the Donoho-Stahel projection based estimator, "M" for the constrained M estimator provided by Rocke, "pairwiseQC" for the orthogonalized quadrant correlation pairwise estimator, and "pairwiseGK" for the Orthogonalized Gnanadesikan-Kettenring pairwise estimator. The default "auto" selects from "donostah", "mcd", and "pairwiseQC" with the goal of producing a good estimate in a reasonable amount of time.

snp.id

a vector of snp id indicating the selected SNP id, if NULL, all SNPs are used

n.RD

a vector of integers indicating which reduced features to be used in KLFDAPC

num.thread

the number of (CPU) cores used; if NA, detect the number of cores automatically

with.id

if TRUE, the returned value with sample.id and sample.id

outgds

NULL or a character of file name for exporting correlations to a GDS file, see details

verbose

if TRUE, show information

Details

For more details, please read our vignette (https://github.com/xinghuq/KLFDAPC).

Value

cor

The correlation coefficients between the genotypes and the population genetic structure

q.values

The p.values and q.values for the correlation coefficients

Author(s)

qinxinghu@gmail.com

References

Patterson, N., Price, A. L., & Reich, D. (2006). Population structure and eigenanalysis. PLoS genet, 2(12), e190.

Luu, K., Bazin, E., & Blum, M. G. (2017). pcadapt: an R package to perform genome scans for selection based on principal component analysis. Molecular ecology resources, 17(1), 67-77.

A High-performance Computing Toolset for Relatedness and Principal Component Analysis of SNP Data. Xiuwen Zheng; David Levine; Jess Shen; Stephanie M. Gogarten; Cathy Laurie; Bruce S. Weir. Bioinformatics 2012; doi: 10.1093/bioinformatics/bts606.

R. A. Maronna and R. H. Zamar (2002) Robust estimates of location and dispersion of high-dimensional datasets. Technometrics 44 (4), 307–317.

Wang, J., Zamar, R., Marazzi, A., Yohai, V., Salibian-Barrera, M., Maronna, R., ... & Maechler, M. (2017). robust: Port of the S+“Robust Library”. R package version 0.4-18. Available online at: https://CRAN. R-project. org/package= robust.

Examples

###Hap_map data
### Doing KLFDAPC, default kernel=kernlab::polydot(degree = 1, scale = 1, offset = 1)
# Open the GDS file
library(KLFDAPC)
library(SNPRelate)
genofile <- snpgdsOpen(snpgdsExampleFileName())
# Get sample id
sample.id <- read.gdsn(index.gdsn(genofile, "sample.id"))
# Get population information

pop_code <- read.gdsn(index.gdsn(genofile, "sample.annot/pop.group"))

# assume the order of sample IDs is as the same as population codes

pop_code <- read.gdsn(index.gdsn(genofile, path="sample.annot/pop.group"))
pop_code=factor(pop_code,levels=unique(pop_code))
# fliter LD thresholds for sensitivity analysis
snpset <- snpgdsLDpruning(genofile, ld.threshold=0.2)

# Get all selected snp id
snpset.id <- unlist(unname(snpset))

snpgdsClose(genofile)

klfdapcfile <- KLFDAPC(snpgdsExampleFileName(), y=pop_code, snp.id=snpset.id,n.pc = 3, kernel=kernlab::rbfdot(sigma = 5), r=5,num.thread=2)

# Make a data.frame
pop_str <- data.frame(sample.id = klfdapcfile$PCA$sample.id,
    pop = factor(pop_code)[match(klfdapcfile$PCA$sample.id, sample.id)],
    RD1 = klfdapcfile$KLFDAPC$Z[,1],    # the first reduced feature
    RD2 =  klfdapcfile$KLFDAPC$Z[,2],    # the second reduced feature
    stringsAsFactors = FALSE)
head(pop_str)

rownames(pop_str)=pop_str$sample.id

genofile <- snpgdsOpen(snpgdsExampleFileName())
chr <- read.gdsn(index.gdsn(genofile, "snp.chromosome"))
klfdapcmat=as.matrix(pop_str[,-c(1:2)])

### using the correlation to estimate the contribution of SNPs to the pop str (which is the same as PCA does )

klfdapcscanqvalue <- KLFDAPT(klfdapcmat, genofile, n.RD=1:2)
snpgdsClose(genofile)
###Mmanhattan plot

 plot(-log10(klfdapcscanqvalue$q.values$q.values), ylim=c(0,4), xlab="", ylab="-log(q-value)",col=chr, pch="+")
   abline(h=2, col="blue")



xinghuq/KLFDAPC documentation built on June 12, 2022, 7:20 p.m.