snpgdsMergeGRM: Merge Multiple Genetic Relationship Matrices (GRM)

Description Usage Arguments Details Value Author(s) See Also Examples

View source: R/IBD.R

Description

Combine multiple genetic relationship matrices with weighted averaging.

Usage

1
2
snpgdsMergeGRM(filelist, out.fn=NULL, out.prec=c("double", "single"),
    out.compress="LZMA_RA", weight=NULL, verbose=TRUE)

Arguments

filelist

a character vector, list of GDS file names

out.fn

NULL, return a GRM object; or characters, the output GDS file name

out.prec

double or single precision for storage

out.compress

the compression method for storing the GRM matrix in the GDS file

weight

NULL, weights proportional to the numbers of SNPs; a numeric vector, or a logical vector (FALSE for excluding some GRMs with a negative weight, weights proportional to the numbers of SNPs)

verbose

if TRUE, show information

Details

The final GRM is the weighted averaged matrix combining multiple GRMs. The merged GRM may not be identical to the GRM calculated using full SNPs, due to missing genotypes or the internal weighting strategy of the specified GRM calculation.

Value

None or a GRM object if out.fn=NULL.

Author(s)

Xiuwen Zheng

See Also

snpgdsGRM

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
# open an example dataset (HapMap)
genofile <- snpgdsOpen(snpgdsExampleFileName())

snpid <- read.gdsn(index.gdsn(genofile, "snp.id"))
snpid <- snpid[snpgdsSNPRateFreq(genofile)$MissingRate == 0]

# there is no missing genotype
grm <- snpgdsGRM(genofile, snp.id=snpid, method="GCTA")


# save two GRMs
set1 <- grm$snp.id[1:(length(grm$snp.id)/2)]
set2 <- setdiff(grm$snp.id, set1)
snpgdsGRM(genofile, method="GCTA", snp.id=set1, out.fn="tmp1.gds")
snpgdsGRM(genofile, method="GCTA", snp.id=set2, out.fn="tmp2.gds")

# merge GRMs and export to a new GDS file
snpgdsMergeGRM(c("tmp1.gds", "tmp2.gds"), "tmp.gds")

# return the GRM
grm2 <- snpgdsMergeGRM(c("tmp1.gds", "tmp2.gds"))


# check
f <- openfn.gds("tmp.gds")
m <- read.gdsn(index.gdsn(f, "grm"))
closefn.gds(f)

summary(c(m - grm$grm))  # ~zero
summary(c(m - grm2$grm))  # zero


# close the file
snpgdsClose(genofile)

# delete the temporary file
unlink(c("tmp1.gds", "tmp2.gds", "tmp.gds"), force=TRUE)

Example output

Loading required package: gdsfmt
SNPRelate -- supported by Streaming SIMD Extensions 2 (SSE2)
Genetic Relationship Matrix (GRM, GCTA):
Excluding 2,288 SNPs (non-autosomes or non-selection)
Excluding 0 SNP (monomorphic: TRUE, MAF: NaN, missing rate: NaN)
    # of samples: 279
    # of SNPs: 6,800
    using 1 thread
GRM Calculation:    the sum of all selected genotypes (0,1,2) = 1908966
CPU capabilities: Double-Precision SSE2
Mon Dec 28 03:11:05 2020    (internal increment: 408)

[..................................................]  0%, ETC: ---        
[==================================================] 100%, completed, 0s
Mon Dec 28 03:11:05 2020    Done.
Genetic Relationship Matrix (GRM, GCTA):
Excluding 5,688 SNPs (non-autosomes or non-selection)
Excluding 0 SNP (monomorphic: TRUE, MAF: NaN, missing rate: NaN)
    # of samples: 279
    # of SNPs: 3,400
    using 1 thread
GRM Calculation:    the sum of all selected genotypes (0,1,2) = 951558
CPU capabilities: Double-Precision SSE2
Mon Dec 28 03:11:05 2020    (internal increment: 408)

[..................................................]  0%, ETC: ---        
[==================================================] 100%, completed, 0s
Saving to the GDS file:

[..................................................]  0%, ETC: ---        
[==================================================] 100%, completed, 0s
Mon Dec 28 03:11:05 2020    Done.
Genetic Relationship Matrix (GRM, GCTA):
Excluding 5,688 SNPs (non-autosomes or non-selection)
Excluding 0 SNP (monomorphic: TRUE, MAF: NaN, missing rate: NaN)
    # of samples: 279
    # of SNPs: 3,400
    using 1 thread
GRM Calculation:    the sum of all selected genotypes (0,1,2) = 957408
CPU capabilities: Double-Precision SSE2
Mon Dec 28 03:11:05 2020    (internal increment: 408)

[..................................................]  0%, ETC: ---        
[==================================================] 100%, completed, 0s
Saving to the GDS file:

[..................................................]  0%, ETC: ---        
[==================================================] 100%, completed, 1s
Mon Dec 28 03:11:06 2020    Done.
GRM merging:
    open 'tmp1.gds' (3,400 variants)
    open 'tmp2.gds' (3,400 variants)
Weight: 0.5, 0.5
Output: tmp.gds

[..................................................]  0%, ETC: ---        
[==================================================] 100%, completed, 0s
GRM merging:
    open 'tmp1.gds' (3,400 variants)
    open 'tmp2.gds' (3,400 variants)
Weight: 0.5, 0.5

[..................................................]  0%, ETC: ---        
[==================================================] 100%, completed, 0s
      Min.    1st Qu.     Median       Mean    3rd Qu.       Max. 
-6.661e-16 -3.469e-18  0.000e+00  1.675e-18  1.388e-17  6.661e-16 
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
      0       0       0       0       0       0 

SNPRelate documentation built on Nov. 8, 2020, 5:31 p.m.