snpgdsGRM: Genetic Relationship Matrix (GRM) for SNP genotype data

Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/IBD.R

Description

Calculate Genetic Relationship Matrix (GRM) using SNP genotype data.

Usage

1
2
3
4
5
snpgdsGRM(gdsobj, sample.id=NULL, snp.id=NULL,
    autosome.only=TRUE, remove.monosnp=TRUE, maf=NaN, missing.rate=NaN,
    method=c("GCTA", "Eigenstrat", "EIGMIX", "Weighted", "Corr", "IndivBeta"),
    num.thread=1L, useMatrix=FALSE, out.fn=NULL, out.prec=c("double", "single"),
    out.compress="LZMA_RA", with.id=TRUE, verbose=TRUE)

Arguments

gdsobj

an object of class SNPGDSFileClass, a SNP GDS file

sample.id

a vector of sample id specifying selected samples; if NULL, all samples are used

snp.id

a vector of snp id specifying selected SNPs; if NULL, all SNPs are used

autosome.only

if TRUE, use autosomal SNPs only; if it is a numeric or character value, keep SNPs according to the specified chromosome

remove.monosnp

if TRUE, remove monomorphic SNPs

maf

to use the SNPs with ">= maf" only; if NaN, no MAF threshold

missing.rate

to use the SNPs with "<= missing.rate" only; if NaN, no missing threshold

method

"GCTA" – genetic relationship matrix defined in CGTA; "Eigenstrat" – genetic covariance matrix in EIGENSTRAT; "EIGMIX" – two times coancestry matrix defined in Zheng&Weir (2016), "Weighted" – weighted GCTA, as the same as "EIGMIX", "Corr" – Scaled GCTA GRM (dividing each i,j element by the product of the square root of the i,i and j,j elements), "IndivBeta" – two times individual beta estimate relative to the minimum of beta; see details

num.thread

the number of (CPU) cores used; if NA, detect the number of cores automatically

useMatrix

if TRUE, use Matrix::dspMatrix to store the output square matrix to save memory

out.fn

NULL for no GDS output, or a file name

out.prec

double or single precision for storage

out.compress

the compression method for storing the GRM matrix in the GDS file

with.id

if TRUE, the returned value with sample.id and sample.id

verbose

if TRUE, show information

Details

"GCTA": the genetic relationship matrix in GCTA is defined as $G_ij = avg_l [(g_il - 2*p_l)*(g_jl - 2*p_l) / 2*p_l*(1 - p_l)]$ for individuals i,j and locus l;

"Eigenstrat": the genetic covariance matrix in EIGENSTRAT $G_ij = avg_l [(g_il - 2*p_l)*(g_jl - 2*p_l) / 2*p_l*(1 - p_l)]$ for individuals i,j and locus l; the missing genotype is imputed by the dosage mean of that locus.

"EIGMIX" / "Weighted": it is the same as '2 * snpgdsEIGMIX(, ibdmat=TRUE, diagadj=FALSE)$ibd': $G_ij = [sum_l (g_il - 2*p_l)*(g_jl - 2*p_l)] / [sum_l 2*p_l*(1 - p_l)]$ for individuals i,j and locus l;

"IndivBeta": 'beta = snpgdsIndivBeta(, inbreeding=TRUE)' (Weir&Goudet, 2017), and beta-based GRM is $grm_ij = 2 * (beta_ij - beta_min) / (1 - beta_min)$ for $i!=j$, $grm_ij = 1 + (beta_i - beta_min) / (1 - beta_min)$ for $i=j$. It is relative to the minimum value of beta estimates.

Value

Return a list if with.id = TRUE:

sample.id

the sample ids used in the analysis

snp.id

the SNP ids used in the analysis

method

characters, the method used

grm

the genetic relationship matrix; different methods might have different meanings and interpretation for estimates

If with.id = FALSE, this function returns the genetic relationship matrix (GRM) without sample and SNP IDs.

Author(s)

Xiuwen Zheng

References

Patterson, N., Price, A. L. & Reich, D. Population structure and eigenanalysis. PLoS Genet. 2, e190 (2006).

Yang, J., Lee, S. H., Goddard, M. E. & Visscher, P. M. GCTA: a tool for genome-wide complex trait analysis. American journal of human genetics 88, 76-82 (2011).

Zheng X, Weir BS. Eigenanalysis on SNP Data with an Interpretation of Identity by Descent. Theoretical Population Biology. 2016 Feb;107:65-76. doi: 10.1016/j.tpb.2015.09.004

Weir BS, Zheng X. SNPs and SNVs in Forensic Science. Forensic Science International: Genetics Supplement Series. 2015. doi:10.1016/j.fsigss.2015.09.106

Weir BS, Goudet J. A Unified Characterization of Population Structure and Relatedness. Genetics. 2017 Aug;206(4):2085-2103. doi: 10.1534/genetics.116.198424.

See Also

snpgdsPCA, snpgdsEIGMIX, snpgdsIndivBeta, snpgdsIndInb, snpgdsFst, snpgdsMergeGRM

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
# open an example dataset (HapMap)
genofile <- snpgdsOpen(snpgdsExampleFileName())

rv <- snpgdsGRM(genofile, method="GCTA")
eig <- eigen(rv$grm)  # Eigen-decomposition

# output to a GDS file
snpgdsGRM(genofile, method="GCTA", out.fn="test.gds")


pop <- factor(read.gdsn(index.gdsn(genofile, "sample.annot/pop.group")))
plot(eig$vectors[,1], eig$vectors[,2], col=pop)
legend("topleft", legend=levels(pop), pch=19, col=1:4)


# close the file
snpgdsClose(genofile)

# delete the temporary file
unlink("test.gds", force=TRUE)

Example output

Loading required package: gdsfmt
SNPRelate -- supported by Streaming SIMD Extensions 2 (SSE2)
Genetic Relationship Matrix (GRM, GCTA):
Excluding 365 SNPs on non-autosomes
Excluding 1 SNP (monomorphic: TRUE, MAF: NaN, missing rate: NaN)
    # of samples: 279
    # of SNPs: 8,722
    using 1 thread
GRM Calculation:    the sum of all selected genotypes (0,1,2) = 2446510
CPU capabilities: Double-Precision SSE2
Tue Feb  2 13:11:30 2021    (internal increment: 408)

[..................................................]  0%, ETC: ---        
[==================================================] 100%, completed, 0s
Tue Feb  2 13:11:30 2021    Done.
Genetic Relationship Matrix (GRM, GCTA):
Excluding 365 SNPs on non-autosomes
Excluding 1 SNP (monomorphic: TRUE, MAF: NaN, missing rate: NaN)
    # of samples: 279
    # of SNPs: 8,722
    using 1 thread
GRM Calculation:    the sum of all selected genotypes (0,1,2) = 2446510
CPU capabilities: Double-Precision SSE2
Tue Feb  2 13:11:30 2021    (internal increment: 408)

[..................................................]  0%, ETC: ---        
[==================================================] 100%, completed, 0s
Saving to the GDS file:

[..................................................]  0%, ETC: ---        
[==================================================] 100%, completed, 1s
Tue Feb  2 13:11:31 2021    Done.

SNPRelate documentation built on Nov. 8, 2020, 5:31 p.m.