SNPRelate-package: Parallel Computing Toolset for Genome-Wide Association...
In SNPRelate: Parallel Computing Toolset for Relatedness and Principal Component Analysis of SNP Data

Description Details Author(s) References Examples

Genome-wide association studies are widely used to investigate the genetic basis of diseases and traits, but they pose many computational challenges. We developed SNPRelate (R package for multi-core symmetric multiprocessing computer architectures) to accelerate two key computations on SNP data: principal component analysis (PCA) and relatedness analysis using identity-by-descent measures. The kernels of our algorithms are written in C/C++ and highly optimized.

Package:	SNPRelate
Type:	Package
License:	GPL version 3
Depends:	gdsfmt (>= 1.0.4)

The genotypes stored in GDS format can be analyzed by the R functions in SNPRelate, which utilize the multi-core feature of machine for a single computer.

Webpage: http://github.com/zhengxwen/SNPRelate, http://corearray.sourceforge.net/

Tutorial: http://corearray.sourceforge.net/tutorials/SNPRelate/

Xiuwen Zheng zhengxwen@gmail.com

Zheng X, Levine D, Shen J, Gogarten SM, Laurie C, Weir BS. A High-performance Computing Toolset for Relatedness and Principal Component Analysis of SNP Data. Bioinformatics (2012); doi: 10.1093/bioinformatics/bts610

####################################################################
# Convert the PLINK BED file to the GDS file
#

# PLINK BED files
bed.fn <- system.file("extdata", "plinkhapmap.bed.gz", package="SNPRelate")
fam.fn <- system.file("extdata", "plinkhapmap.fam.gz", package="SNPRelate")
bim.fn <- system.file("extdata", "plinkhapmap.bim.gz", package="SNPRelate")

# convert
snpgdsBED2GDS(bed.fn, fam.fn, bim.fn, "HapMap.gds")


####################################################################
# Principal Component Analysis
#

# open
genofile <- snpgdsOpen("HapMap.gds")

RV <- snpgdsPCA(genofile)
plot(RV$eigenvect[,2], RV$eigenvect[,1], xlab="PC 2", ylab="PC 1",
    col=rgb(0,0,150, 50, maxColorValue=255), pch=19)

# close the file
snpgdsClose(genofile)


####################################################################
# Identity-By-Descent (IBD) Analysis
#

# open
genofile <- snpgdsOpen(snpgdsExampleFileName())

RV <- snpgdsIBDMoM(genofile)
flag <- lower.tri(RV$k0)
plot(RV$k0[flag], RV$k1[flag], xlab="k0", ylab="k1",
    col=rgb(0,0,150, 50, maxColorValue=255), pch=19)
abline(1, -1, col="red", lty=4)

# close the file
snpgdsClose(genofile)


####################################################################
# Identity-By-State (IBS) Analysis
#

# open
genofile <- snpgdsOpen(snpgdsExampleFileName())

RV <- snpgdsIBS(genofile)
m <- 1 - RV$ibs
colnames(m) <- rownames(m) <- RV$sample.id
GeneticDistance <- as.dist(m[1:45, 1:45])
HC <- hclust(GeneticDistance, "ave")
plot(HC)

# close the file
snpgdsClose(genofile)


####################################################################
# Linkage Disequilibrium (LD) Analysis
#

# open an example dataset (HapMap)
genofile <- snpgdsOpen(snpgdsExampleFileName())

snpset <- read.gdsn(index.gdsn(genofile, "snp.id"))[1:200]
L1 <- snpgdsLDMat(genofile, snp.id=snpset, method="composite", slide=-1)

# plot
image(abs(L1$LD), col=terrain.colors(64))

# close the file
snpgdsClose(genofile)

Loading required package: gdsfmt
SNPRelate -- supported by Streaming SIMD Extensions 2 (SSE2)
Start snpgdsBED2GDS ...
	BED file: "/usr/local/lib/R/site-library/SNPRelate/extdata/plinkhapmap.bed.gz" in the SNP-major mode (Sample X SNP)
	FAM file: "/usr/local/lib/R/site-library/SNPRelate/extdata/plinkhapmap.fam.gz", DONE.
	BIM file: "/usr/local/lib/R/site-library/SNPRelate/extdata/plinkhapmap.bim.gz", DONE.
Sun Feb 10 13:31:39 2019 	store sample id, snp id, position, and chromosome.
	start writing: 60 samples, 5000 SNPs ...
 	Sun Feb 10 13:31:39 2019	0%
 	Sun Feb 10 13:31:39 2019	100%
Sun Feb 10 13:31:39 2019 	Done.
Optimize the access efficiency ...
Clean up the fragments of GDS file:
    open the file 'HapMap.gds' (104.6K)
    # of fragments: 38
    save to 'HapMap.gds.tmp'
    rename 'HapMap.gds.tmp' (104.4K, reduced: 240B)
    # of fragments: 18
Principal Component Analysis (PCA) on genotypes:
Excluding 203 SNPs on non-autosomes
Excluding 28 SNPs (monomorphic: TRUE, MAF: NaN, missing rate: NaN)
Working space: 60 samples, 4,769 SNPs
    using 1 (CPU) core
PCA:    the sum of all selected genotypes (0,1,2) = 124273
CPU capabilities: Double-Precision SSE2
Sun Feb 10 13:31:39 2019    (internal increment: 1908)

[..................................................]  0%, ETC: ---    
[==================================================] 100%, completed in 0s
Sun Feb 10 13:31:39 2019    Begin (eigenvalues and eigenvectors)
Sun Feb 10 13:31:39 2019    Done.
IBD analysis (PLINK method of moment) on genotypes:
Excluding 365 SNPs on non-autosomes
Excluding 1 SNP (monomorphic: TRUE, MAF: NaN, missing rate: NaN)
Working space: 279 samples, 8,722 SNPs
    using 1 (CPU) core
PLINK IBD:    the sum of all selected genotypes (0,1,2) = 2446510
Sun Feb 10 13:31:39 2019    (internal increment: 13056)

[..................................................]  0%, ETC: ---    
[==================================================] 100%, completed in 0s
Sun Feb 10 13:31:39 2019    Done.
Identity-By-State (IBS) analysis on genotypes:
Excluding 365 SNPs on non-autosomes
Excluding 1 SNP (monomorphic: TRUE, MAF: NaN, missing rate: NaN)
Working space: 279 samples, 8,722 SNPs
    using 1 (CPU) core
IBS:    the sum of all selected genotypes (0,1,2) = 2446510
Sun Feb 10 13:31:40 2019    (internal increment: 13056)

[..................................................]  0%, ETC: ---    
[==================================================] 100%, completed in 0s
Sun Feb 10 13:31:40 2019    Done.
Linkage Disequilibrium (LD) estimation on genotypes:
Working space: 279 samples, 200 SNPs
    using 1 (CPU) core.
    method: composite
LD matrix:    the sum of all selected genotypes (0,1,2) = 55417

SNPRelate documentation built on Nov. 8, 2020, 5:31 p.m.

SNPRelate index

Package overview README.md Tutorials for the R/Bioconductor Package SNPRelate

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

SNPRelate
Parallel Computing Toolset for Relatedness and Principal Component Analysis of SNP Data

SNPRelate-package: Parallel Computing Toolset for Genome-Wide Association...
In SNPRelate: Parallel Computing Toolset for Relatedness and Principal Component Analysis of SNP Data

Description

Details

Author(s)

References

Examples

Example output

Related to SNPRelate-package in SNPRelate...

R Package Documentation

Browse R Packages

We want your feedback!

SNPRelate Parallel Computing Toolset for Relatedness and Principal Component Analysis of SNP Data

SNPRelate-package: Parallel Computing Toolset for Genome-Wide Association... In SNPRelate: Parallel Computing Toolset for Relatedness and Principal Component Analysis of SNP Data

Description

Details

Author(s)

References

Examples

Example output

Related to SNPRelate-package in SNPRelate...

R Package Documentation

Browse R Packages

We want your feedback!

SNPRelate
Parallel Computing Toolset for Relatedness and Principal Component Analysis of SNP Data

SNPRelate-package: Parallel Computing Toolset for Genome-Wide Association...
In SNPRelate: Parallel Computing Toolset for Relatedness and Principal Component Analysis of SNP Data