snpgdsLDMat: Linkage Disequilibrium (LD) analysis

Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/LD.R

Description

Return a LD matrix for SNP pairs.

Usage

1
2
3
snpgdsLDMat(gdsobj, sample.id=NULL, snp.id=NULL, slide=250L,
    method=c("composite", "r", "dprime", "corr", "cov"), mat.trim=FALSE,
    num.thread=1L, with.id=TRUE, verbose=TRUE)

Arguments

gdsobj

an object of class SNPGDSFileClass, a SNP GDS file

sample.id

a vector of sample id specifying selected samples; if NULL, all samples are used

snp.id

a vector of snp id specifying selected SNPs; if NULL, all SNPs are used

slide

# of SNPs, the size of sliding window; if slide < 0, return a full LD matrix; see details

method

"composite", "r", "dprime", "corr", "cov", see details

mat.trim

if TRUE, trim the matrix when slide > 0: the function returns a "num_slide x (n_snp - slide)" matrix

num.thread

the number of (CPU) cores used; if NA, detect the number of cores automatically

with.id

if TRUE, the returned value with sample.id and sample.id

verbose

if TRUE, show information

Details

Four methods can be used to calculate linkage disequilibrium values: "composite" for LD composite measure, "r" for R coefficient (by EM algorithm assuming HWE, it could be negative), "dprime" for D', and "corr" for correlation coefficient. The method "corr" is equivalent to "composite", when SNP genotypes are coded as: 0 – BB, 1 – AB, 2 – AA.

If slide <= 0, the function returns a n-by-n LD matrix where the value of i row and j column is LD of i and j SNPs. If slide > 0, it returns a m-by-n LD matrix where n is the number of SNPs, m is the size of sliding window, and the value of i row and j column is LD of j and j+i SNPs.

Value

Return a list:

sample.id

the sample ids used in the analysis

snp.id

the SNP ids used in the analysis

LD

a matrix of LD values

slide

the size of sliding window

Author(s)

Xiuwen Zheng

References

Weir B: Inferences about linkage disequilibrium. Biometrics 1979; 35: 235-254.

Weir B: Genetic Data Analysis II. Sunderland, MA: Sinauer Associates, 1996.

Weir BS, Cockerham CC: Complete characterization of disequilibrium at two loci; in Feldman MW (ed): Mathematical Evolutionary Theory. Princeton, NJ: Princeton University Press, 1989.

See Also

snpgdsLDpair, snpgdsLDpruning

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
# open an example dataset (HapMap)
genofile <- snpgdsOpen(snpgdsExampleFileName())

# missing proportion and MAF
ff <- snpgdsSNPRateFreq(genofile)

# chromosome 15
snpset <- read.gdsn(index.gdsn(genofile, "snp.id"))[
    ff$MissingRate==0 & ff$MinorFreq>0 &
    read.gdsn(index.gdsn(genofile, "snp.chromosome"))==15]
length(snpset)


# LD matrix without sliding window
ld.noslide <- snpgdsLDMat(genofile, snp.id=snpset, slide=-1, method="composite")
# plot
image(t(ld.noslide$LD^2), col=terrain.colors(16))

# LD matrix with a sliding window
ld.slide <- snpgdsLDMat(genofile, snp.id=snpset, method="composite")
# plot
image(t(ld.slide$LD^2), col=terrain.colors(16))


# close the genotype file
snpgdsClose(genofile)

Example output

Loading required package: gdsfmt
SNPRelate -- supported by Streaming SIMD Extensions 2 (SSE2)
[1] 203
Linkage Disequilibrium (LD) estimation on genotypes:
    # of samples: 279
    # of SNPs: 203
    using 1 thread
    method: composite
LD matrix:    the sum of all selected genotypes (0,1,2) = 56582
Linkage Disequilibrium (LD) estimation on genotypes:
    # of samples: 279
    # of SNPs: 203
    using 1 thread
    sliding window size: 203
    method: composite
LD matrix:    the sum of all selected genotypes (0,1,2) = 56582

SNPRelate documentation built on Nov. 8, 2020, 5:31 p.m.