KLFDAPT: Genome scan for detecting natural selection based on KLFDAPC
In xinghuq/KLFDAPC: KLFDAPC: Kernel Local Fisher Discriminant Analysis of Principal Components- A Non-linear Tool for Population Structure Inference.

View source: R/KLFDAPT.R

KLFDAPT

R Documentation

Genome scan for detecting natural selection based on KLFDAPC

Description

KLFDAPT implements genome scan based on KLFDAPC. The implementation is similar with PCA-based genome scan that reported by Patterson, N.,et al. 2006 and Luu, K., et al. 2017. The loci are weighted by the correlations between genetic variation and the genetic structure (first K reduced features).

Usage

KLFDAPT(klfdapc, gdsfile, estimation="auto", snp.id = NULL, n.RD = NULL, num.thread = 1L, with.id = TRUE, outgds = NULL, verbose = TRUE)

Arguments

`klfdapc`	KLFDAPC matrix. The results returned from KLFDAPC analysis.
`gdsfile`	The genotype file, it is an object of class SNPGDSFileClass, a SNP GDS file
`estimation`	a character string specifying the robust estimator to be used. The choices are: "mcd" for the Fast MCD algorithm of Rousseeuw and Van Driessen, "weighted" for the Reweighted MCD, "donostah" for the Donoho-Stahel projection based estimator, "M" for the constrained M estimator provided by Rocke, "pairwiseQC" for the orthogonalized quadrant correlation pairwise estimator, and "pairwiseGK" for the Orthogonalized Gnanadesikan-Kettenring pairwise estimator. The default "auto" selects from "donostah", "mcd", and "pairwiseQC" with the goal of producing a good estimate in a reasonable amount of time.
`snp.id`	a vector of snp id indicating the selected SNP id, if NULL, all SNPs are used
`n.RD`	a vector of integers indicating which reduced features to be used in KLFDAPC
`num.thread`	the number of (CPU) cores used; if NA, detect the number of cores automatically
`with.id`	if TRUE, the returned value with sample.id and sample.id
`outgds`	NULL or a character of file name for exporting correlations to a GDS file, see details
`verbose`	if TRUE, show information

Details

For more details, please read our vignette (https://github.com/xinghuq/KLFDAPC).

Value

`cor`	The correlation coefficients between the genotypes and the population genetic structure
`q.values`	The p.values and q.values for the correlation coefficients

Author(s)

qinxinghu@gmail.com

References

Patterson, N., Price, A. L., & Reich, D. (2006). Population structure and eigenanalysis. PLoS genet, 2(12), e190.

Luu, K., Bazin, E., & Blum, M. G. (2017). pcadapt: an R package to perform genome scans for selection based on principal component analysis. Molecular ecology resources, 17(1), 67-77.

A High-performance Computing Toolset for Relatedness and Principal Component Analysis of SNP Data. Xiuwen Zheng; David Levine; Jess Shen; Stephanie M. Gogarten; Cathy Laurie; Bruce S. Weir. Bioinformatics 2012; doi: 10.1093/bioinformatics/bts606.

R. A. Maronna and R. H. Zamar (2002) Robust estimates of location and dispersion of high-dimensional datasets. Technometrics 44 (4), 307–317.

Wang, J., Zamar, R., Marazzi, A., Yohai, V., Salibian-Barrera, M., Maronna, R., ... & Maechler, M. (2017). robust: Port of the S+“Robust Library”. R package version 0.4-18. Available online at: https://CRAN. R-project. org/package= robust.

Examples

###Hap_map data
### Doing KLFDAPC, default kernel=kernlab::polydot(degree = 1, scale = 1, offset = 1)
# Open the GDS file
library(KLFDAPC)
library(SNPRelate)
genofile <- snpgdsOpen(snpgdsExampleFileName())
# Get sample id
sample.id <- read.gdsn(index.gdsn(genofile, "sample.id"))
# Get population information

pop_code <- read.gdsn(index.gdsn(genofile, "sample.annot/pop.group"))

# assume the order of sample IDs is as the same as population codes

pop_code <- read.gdsn(index.gdsn(genofile, path="sample.annot/pop.group"))
pop_code=factor(pop_code,levels=unique(pop_code))
# fliter LD thresholds for sensitivity analysis
snpset <- snpgdsLDpruning(genofile, ld.threshold=0.2)

# Get all selected snp id
snpset.id <- unlist(unname(snpset))

snpgdsClose(genofile)

klfdapcfile <- KLFDAPC(snpgdsExampleFileName(), y=pop_code, snp.id=snpset.id,n.pc = 3, kernel=kernlab::rbfdot(sigma = 5), r=5,num.thread=2)

# Make a data.frame
pop_str <- data.frame(sample.id = klfdapcfile$PCA$sample.id,
    pop = factor(pop_code)[match(klfdapcfile$PCA$sample.id, sample.id)],
    RD1 = klfdapcfile$KLFDAPC$Z[,1],    # the first reduced feature
    RD2 =  klfdapcfile$KLFDAPC$Z[,2],    # the second reduced feature
    stringsAsFactors = FALSE)
head(pop_str)

rownames(pop_str)=pop_str$sample.id

genofile <- snpgdsOpen(snpgdsExampleFileName())
chr <- read.gdsn(index.gdsn(genofile, "snp.chromosome"))
klfdapcmat=as.matrix(pop_str[,-c(1:2)])

### using the correlation to estimate the contribution of SNPs to the pop str (which is the same as PCA does )

klfdapcscanqvalue <- KLFDAPT(klfdapcmat, genofile, n.RD=1:2)
snpgdsClose(genofile)
###Mmanhattan plot

 plot(-log10(klfdapcscanqvalue$q.values$q.values), ylim=c(0,4), xlab="", ylab="-log(q-value)",col=chr, pch="+")
   abline(h=2, col="blue")

xinghuq/KLFDAPC documentation built on June 12, 2022, 7:20 p.m.