Description Usage Arguments Details Value Author(s) References See Also Examples
Calculate Genetic Relationship Matrix (GRM) using SNP genotype data.
1 2 3 4 5 | snpgdsGRM(gdsobj, sample.id=NULL, snp.id=NULL,
autosome.only=TRUE, remove.monosnp=TRUE, maf=NaN, missing.rate=NaN,
method=c("GCTA", "Eigenstrat", "EIGMIX", "Weighted", "Corr", "IndivBeta"),
num.thread=1L, useMatrix=FALSE, out.fn=NULL, out.prec=c("double", "single"),
out.compress="LZMA_RA", with.id=TRUE, verbose=TRUE)
|
gdsobj |
an object of class |
sample.id |
a vector of sample id specifying selected samples; if NULL, all samples are used |
snp.id |
a vector of snp id specifying selected SNPs; if NULL, all SNPs are used |
autosome.only |
if |
remove.monosnp |
if TRUE, remove monomorphic SNPs |
maf |
to use the SNPs with ">= maf" only; if NaN, no MAF threshold |
missing.rate |
to use the SNPs with "<= missing.rate" only; if NaN, no missing threshold |
method |
"GCTA" – genetic relationship matrix defined in CGTA; "Eigenstrat" – genetic covariance matrix in EIGENSTRAT; "EIGMIX" – two times coancestry matrix defined in Zheng&Weir (2016), "Weighted" – weighted GCTA, as the same as "EIGMIX", "Corr" – Scaled GCTA GRM (dividing each i,j element by the product of the square root of the i,i and j,j elements), "IndivBeta" – two times individual beta estimate relative to the minimum of beta; see details |
num.thread |
the number of (CPU) cores used; if |
useMatrix |
if |
out.fn |
NULL for no GDS output, or a file name |
out.prec |
double or single precision for storage |
out.compress |
the compression method for storing the GRM matrix in the GDS file |
with.id |
if |
verbose |
if |
"GCTA": the genetic relationship matrix in GCTA is defined as $G_ij = avg_l [(g_il - 2*p_l)*(g_jl - 2*p_l) / 2*p_l*(1 - p_l)]$ for individuals i,j and locus l;
"Eigenstrat": the genetic covariance matrix in EIGENSTRAT $G_ij = avg_l [(g_il - 2*p_l)*(g_jl - 2*p_l) / 2*p_l*(1 - p_l)]$ for individuals i,j and locus l; the missing genotype is imputed by the dosage mean of that locus.
"EIGMIX" / "Weighted": it is the same as '2 * snpgdsEIGMIX(, ibdmat=TRUE, diagadj=FALSE)$ibd': $G_ij = [sum_l (g_il - 2*p_l)*(g_jl - 2*p_l)] / [sum_l 2*p_l*(1 - p_l)]$ for individuals i,j and locus l;
"IndivBeta": 'beta = snpgdsIndivBeta(, inbreeding=TRUE)' (Weir&Goudet, 2017), and beta-based GRM is $grm_ij = 2 * (beta_ij - beta_min) / (1 - beta_min)$ for $i!=j$, $grm_ij = 1 + (beta_i - beta_min) / (1 - beta_min)$ for $i=j$. It is relative to the minimum value of beta estimates.
Return a list if with.id = TRUE
:
sample.id |
the sample ids used in the analysis |
snp.id |
the SNP ids used in the analysis |
method |
characters, the method used |
grm |
the genetic relationship matrix; different methods might have different meanings and interpretation for estimates |
If with.id = FALSE
, this function returns the genetic relationship
matrix (GRM) without sample and SNP IDs.
Xiuwen Zheng
Patterson, N., Price, A. L. & Reich, D. Population structure and eigenanalysis. PLoS Genet. 2, e190 (2006).
Yang, J., Lee, S. H., Goddard, M. E. & Visscher, P. M. GCTA: a tool for genome-wide complex trait analysis. American journal of human genetics 88, 76-82 (2011).
Zheng X, Weir BS. Eigenanalysis on SNP Data with an Interpretation of Identity by Descent. Theoretical Population Biology. 2016 Feb;107:65-76. doi: 10.1016/j.tpb.2015.09.004
Weir BS, Zheng X. SNPs and SNVs in Forensic Science. Forensic Science International: Genetics Supplement Series. 2015. doi:10.1016/j.fsigss.2015.09.106
Weir BS, Goudet J. A Unified Characterization of Population Structure and Relatedness. Genetics. 2017 Aug;206(4):2085-2103. doi: 10.1534/genetics.116.198424.
snpgdsPCA
, snpgdsEIGMIX
,
snpgdsIndivBeta
,
snpgdsIndInb
, snpgdsFst
,
snpgdsMergeGRM
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | # open an example dataset (HapMap)
genofile <- snpgdsOpen(snpgdsExampleFileName())
rv <- snpgdsGRM(genofile, method="GCTA")
eig <- eigen(rv$grm) # Eigen-decomposition
# output to a GDS file
snpgdsGRM(genofile, method="GCTA", out.fn="test.gds")
pop <- factor(read.gdsn(index.gdsn(genofile, "sample.annot/pop.group")))
plot(eig$vectors[,1], eig$vectors[,2], col=pop)
legend("topleft", legend=levels(pop), pch=19, col=1:4)
# close the file
snpgdsClose(genofile)
# delete the temporary file
unlink("test.gds", force=TRUE)
|
Loading required package: gdsfmt
SNPRelate -- supported by Streaming SIMD Extensions 2 (SSE2)
Genetic Relationship Matrix (GRM, GCTA):
Excluding 365 SNPs on non-autosomes
Excluding 1 SNP (monomorphic: TRUE, MAF: NaN, missing rate: NaN)
# of samples: 279
# of SNPs: 8,722
using 1 thread
GRM Calculation: the sum of all selected genotypes (0,1,2) = 2446510
CPU capabilities: Double-Precision SSE2
Tue Feb 2 13:11:30 2021 (internal increment: 408)
[..................................................] 0%, ETC: ---
[==================================================] 100%, completed, 0s
Tue Feb 2 13:11:30 2021 Done.
Genetic Relationship Matrix (GRM, GCTA):
Excluding 365 SNPs on non-autosomes
Excluding 1 SNP (monomorphic: TRUE, MAF: NaN, missing rate: NaN)
# of samples: 279
# of SNPs: 8,722
using 1 thread
GRM Calculation: the sum of all selected genotypes (0,1,2) = 2446510
CPU capabilities: Double-Precision SSE2
Tue Feb 2 13:11:30 2021 (internal increment: 408)
[..................................................] 0%, ETC: ---
[==================================================] 100%, completed, 0s
Saving to the GDS file:
[..................................................] 0%, ETC: ---
[==================================================] 100%, completed, 1s
Tue Feb 2 13:11:31 2021 Done.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.