SNPRelate-package: Parallel Computing Toolset for Genome-Wide Association...

Description Details Author(s) References Examples

Description

Genome-wide association studies are widely used to investigate the genetic basis of diseases and traits, but they pose many computational challenges. We developed SNPRelate (R package for multi-core symmetric multiprocessing computer architectures) to accelerate two key computations on SNP data: principal component analysis (PCA) and relatedness analysis using identity-by-descent measures. The kernels of our algorithms are written in C/C++ and highly optimized.

Details

Package: SNPRelate
Type: Package
License: GPL version 3
Depends: gdsfmt (>= 1.0.4)

The genotypes stored in GDS format can be analyzed by the R functions in SNPRelate, which utilize the multi-core feature of machine for a single computer.

Webpage: http://github.com/zhengxwen/SNPRelate, http://corearray.sourceforge.net/

Tutorial: http://corearray.sourceforge.net/tutorials/SNPRelate/

Author(s)

Xiuwen Zheng zhengxwen@gmail.com

References

Zheng X, Levine D, Shen J, Gogarten SM, Laurie C, Weir BS. A High-performance Computing Toolset for Relatedness and Principal Component Analysis of SNP Data. Bioinformatics (2012); doi: 10.1093/bioinformatics/bts610

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
####################################################################
# Convert the PLINK BED file to the GDS file
#

# PLINK BED files
bed.fn <- system.file("extdata", "plinkhapmap.bed.gz", package="SNPRelate")
fam.fn <- system.file("extdata", "plinkhapmap.fam.gz", package="SNPRelate")
bim.fn <- system.file("extdata", "plinkhapmap.bim.gz", package="SNPRelate")

# convert
snpgdsBED2GDS(bed.fn, fam.fn, bim.fn, "HapMap.gds")


####################################################################
# Principal Component Analysis
#

# open
genofile <- snpgdsOpen("HapMap.gds")

RV <- snpgdsPCA(genofile)
plot(RV$eigenvect[,2], RV$eigenvect[,1], xlab="PC 2", ylab="PC 1",
    col=rgb(0,0,150, 50, maxColorValue=255), pch=19)

# close the file
snpgdsClose(genofile)


####################################################################
# Identity-By-Descent (IBD) Analysis
#

# open
genofile <- snpgdsOpen(snpgdsExampleFileName())

RV <- snpgdsIBDMoM(genofile)
flag <- lower.tri(RV$k0)
plot(RV$k0[flag], RV$k1[flag], xlab="k0", ylab="k1",
    col=rgb(0,0,150, 50, maxColorValue=255), pch=19)
abline(1, -1, col="red", lty=4)

# close the file
snpgdsClose(genofile)


####################################################################
# Identity-By-State (IBS) Analysis
#

# open
genofile <- snpgdsOpen(snpgdsExampleFileName())

RV <- snpgdsIBS(genofile)
m <- 1 - RV$ibs
colnames(m) <- rownames(m) <- RV$sample.id
GeneticDistance <- as.dist(m[1:45, 1:45])
HC <- hclust(GeneticDistance, "ave")
plot(HC)

# close the file
snpgdsClose(genofile)


####################################################################
# Linkage Disequilibrium (LD) Analysis
#

# open an example dataset (HapMap)
genofile <- snpgdsOpen(snpgdsExampleFileName())

snpset <- read.gdsn(index.gdsn(genofile, "snp.id"))[1:200]
L1 <- snpgdsLDMat(genofile, snp.id=snpset, method="composite", slide=-1)

# plot
image(abs(L1$LD), col=terrain.colors(64))

# close the file
snpgdsClose(genofile)

Example output

Loading required package: gdsfmt
SNPRelate -- supported by Streaming SIMD Extensions 2 (SSE2)
Start snpgdsBED2GDS ...
	BED file: "/usr/local/lib/R/site-library/SNPRelate/extdata/plinkhapmap.bed.gz" in the SNP-major mode (Sample X SNP)
	FAM file: "/usr/local/lib/R/site-library/SNPRelate/extdata/plinkhapmap.fam.gz", DONE.
	BIM file: "/usr/local/lib/R/site-library/SNPRelate/extdata/plinkhapmap.bim.gz", DONE.
Sun Feb 10 13:31:39 2019 	store sample id, snp id, position, and chromosome.
	start writing: 60 samples, 5000 SNPs ...
 	Sun Feb 10 13:31:39 2019	0%
 	Sun Feb 10 13:31:39 2019	100%
Sun Feb 10 13:31:39 2019 	Done.
Optimize the access efficiency ...
Clean up the fragments of GDS file:
    open the file 'HapMap.gds' (104.6K)
    # of fragments: 38
    save to 'HapMap.gds.tmp'
    rename 'HapMap.gds.tmp' (104.4K, reduced: 240B)
    # of fragments: 18
Principal Component Analysis (PCA) on genotypes:
Excluding 203 SNPs on non-autosomes
Excluding 28 SNPs (monomorphic: TRUE, MAF: NaN, missing rate: NaN)
Working space: 60 samples, 4,769 SNPs
    using 1 (CPU) core
PCA:    the sum of all selected genotypes (0,1,2) = 124273
CPU capabilities: Double-Precision SSE2
Sun Feb 10 13:31:39 2019    (internal increment: 1908)

[..................................................]  0%, ETC: ---    
[==================================================] 100%, completed in 0s
Sun Feb 10 13:31:39 2019    Begin (eigenvalues and eigenvectors)
Sun Feb 10 13:31:39 2019    Done.
IBD analysis (PLINK method of moment) on genotypes:
Excluding 365 SNPs on non-autosomes
Excluding 1 SNP (monomorphic: TRUE, MAF: NaN, missing rate: NaN)
Working space: 279 samples, 8,722 SNPs
    using 1 (CPU) core
PLINK IBD:    the sum of all selected genotypes (0,1,2) = 2446510
Sun Feb 10 13:31:39 2019    (internal increment: 13056)

[..................................................]  0%, ETC: ---    
[==================================================] 100%, completed in 0s
Sun Feb 10 13:31:39 2019    Done.
Identity-By-State (IBS) analysis on genotypes:
Excluding 365 SNPs on non-autosomes
Excluding 1 SNP (monomorphic: TRUE, MAF: NaN, missing rate: NaN)
Working space: 279 samples, 8,722 SNPs
    using 1 (CPU) core
IBS:    the sum of all selected genotypes (0,1,2) = 2446510
Sun Feb 10 13:31:40 2019    (internal increment: 13056)

[..................................................]  0%, ETC: ---    
[==================================================] 100%, completed in 0s
Sun Feb 10 13:31:40 2019    Done.
Linkage Disequilibrium (LD) estimation on genotypes:
Working space: 279 samples, 200 SNPs
    using 1 (CPU) core.
    method: composite
LD matrix:    the sum of all selected genotypes (0,1,2) = 55417

SNPRelate documentation built on Nov. 8, 2020, 5:31 p.m.