snpgdsCreateGenoSet: Create a SNP genotype dataset from a GDS file

Description Usage Arguments Value Author(s) See Also Examples

View source: R/AllUtilities.R

Description

To create a GDS file of genotypes from a specified GDS file.

Usage

1
2
3
snpgdsCreateGenoSet(src.fn, dest.fn, sample.id=NULL, snp.id=NULL,
    snpfirstdim=NULL, compress.annotation="ZIP_RA.max", compress.geno="",
    verbose=TRUE)

Arguments

src.fn

the file name of a specified GDS file

dest.fn

the file name of output GDS file

sample.id

a vector of sample id specifying selected samples; if NULL, all samples are used

snp.id

a vector of snp id specifying selected SNPs; if NULL, all SNPs are used

snpfirstdim

if TRUE, genotypes are stored in the individual-major mode, (i.e, list all SNPs for the first individual, and then list all SNPs for the second individual, etc)

compress.annotation

the compression method for the variables except genotype

compress.geno

the compression method for the variable genotype

verbose

if TRUE, show information

Value

None.

Author(s)

Xiuwen Zheng

See Also

snpgdsCreateGeno, snpgdsCombineGeno

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
# open an example dataset (HapMap)
(genofile <- snpgdsOpen(snpgdsExampleFileName()))
# +    [  ] *
# |--+ sample.id   { VStr8 279 ZIP(29.9%), 679B }
# |--+ snp.id   { Int32 9088 ZIP(34.8%), 12.3K }
# |--+ snp.rs.id   { VStr8 9088 ZIP(40.1%), 36.2K }
# |--+ snp.position   { Int32 9088 ZIP(94.7%), 33.6K }
# |--+ snp.chromosome   { UInt8 9088 ZIP(0.94%), 85B } *
# |--+ snp.allele   { VStr8 9088 ZIP(11.3%), 4.0K }
# |--+ genotype   { Bit2 279x9088, 619.0K } *
# \--+ sample.annot   [ data.frame ] *
#    |--+ family.id   { VStr8 279 ZIP(34.4%), 514B }
#    |--+ father.id   { VStr8 279 ZIP(31.5%), 220B }
#    |--+ mother.id   { VStr8 279 ZIP(30.9%), 214B }
#    |--+ sex   { VStr8 279 ZIP(17.0%), 95B }
#    \--+ pop.group   { VStr8 279 ZIP(6.18%), 69B }

set.seed(1000)
snpset <- unlist(snpgdsLDpruning(genofile))
length(snpset)
# 6547

# close the file
snpgdsClose(genofile)

snpgdsCreateGenoSet(snpgdsExampleFileName(), "test.gds", snp.id=snpset)

####################################################
# check

(gfile <- snpgdsOpen("test.gds"))
# +    [  ] *
# |--+ sample.id   { Str8 279 ZIP_ra(31.2%), 715B }
# |--+ snp.id   { Int32 6547 ZIP_ra(34.9%), 8.9K }
# |--+ snp.rs.id   { Str8 6547 ZIP_ra(41.5%), 27.1K }
# |--+ snp.position   { Int32 6547 ZIP_ra(94.9%), 24.3K }
# |--+ snp.chromosome   { Int32 6547 ZIP_ra(0.45%), 124B }
# |--+ snp.allele   { Str8 6547 ZIP_ra(11.5%), 3.0K }
# \--+ genotype   { Bit2 279x6547, 446.0K } *

# close the file
snpgdsClose(gfile)


unlink("test.gds", force=TRUE)

Example output

Loading required package: gdsfmt
SNPRelate -- supported by Streaming SIMD Extensions 2 (SSE2)
File: /usr/lib/R/site-library/SNPRelate/extdata/hapmap_geno.gds (709.6K)
+    [  ] *
|--+ sample.id   { VStr8 279 ZIP(29.9%), 679B }
|--+ snp.id   { Int32 9088 ZIP(34.8%), 12.3K }
|--+ snp.rs.id   { VStr8 9088 ZIP(40.1%), 36.2K }
|--+ snp.position   { Int32 9088 ZIP(94.7%), 33.6K }
|--+ snp.chromosome   { UInt8 9088 ZIP(0.94%), 85B } *
|--+ snp.allele   { VStr8 9088 ZIP(11.3%), 4.0K }
|--+ genotype   { Bit2 279x9088, 619.0K } *
\--+ sample.annot   [ data.frame ] *
   |--+ family.id   { VStr8 279 ZIP(34.4%), 514B }
   |--+ father.id   { VStr8 279 ZIP(31.5%), 220B }
   |--+ mother.id   { VStr8 279 ZIP(30.9%), 214B }
   |--+ sex   { VStr8 279 ZIP(17.0%), 95B }
   \--+ pop.group   { VStr8 279 ZIP(6.18%), 69B }
SNP pruning based on LD:
Excluding 365 SNPs on non-autosomes
Excluding 1 SNP (monomorphic: TRUE, MAF: NaN, missing rate: NaN)
    # of samples: 279
    # of SNPs: 8,722
    using 1 thread
    sliding window: 500,000 basepairs, Inf SNPs
    |LD| threshold: 0.2
    method: composite
Chromosome 1: 76.12%, 545/716
Chromosome 2: 72.78%, 540/742
Chromosome 3: 74.71%, 455/609
Chromosome 4: 73.49%, 413/562
Chromosome 5: 76.86%, 435/566
Chromosome 6: 75.75%, 428/565
Chromosome 7: 75.42%, 356/472
Chromosome 8: 71.11%, 347/488
Chromosome 9: 77.88%, 324/416
Chromosome 10: 74.12%, 358/483
Chromosome 11: 77.85%, 348/447
Chromosome 12: 76.81%, 328/427
Chromosome 13: 76.16%, 262/344
Chromosome 14: 76.60%, 216/282
Chromosome 15: 76.34%, 200/262
Chromosome 16: 72.66%, 202/278
Chromosome 17: 73.91%, 153/207
Chromosome 18: 73.68%, 196/266
Chromosome 19: 85.00%, 102/120
Chromosome 20: 71.62%, 164/229
Chromosome 21: 76.98%, 97/126
Chromosome 22: 75.86%, 88/116
6,557 markers are selected in total.
[1] 6557
Create a GDS genotype file:
The new dataset consists of 279 samples and 6557 SNPs
    write sample.id
    write snp.id
    write snp.rs.id
    write snp.position
    write snp.chromosome
    write snp.allele
SNP genotypes are stored in SNP-major mode (Sample X SNP).
File: /work/tmp/test.gds (512.4K)
+    [  ] *
|--+ sample.id   { Str8 279 ZIP_ra(31.2%), 715B }
|--+ snp.id   { Int32 6557 ZIP_ra(34.9%), 9.0K }
|--+ snp.rs.id   { Str8 6557 ZIP_ra(41.5%), 27.1K }
|--+ snp.position   { Int32 6557 ZIP_ra(94.9%), 24.3K }
|--+ snp.chromosome   { Int32 6557 ZIP_ra(0.45%), 124B }
|--+ snp.allele   { Str8 6557 ZIP_ra(11.5%), 3.0K }
\--+ genotype   { Bit2 279x6557, 446.6K } *

SNPRelate documentation built on Nov. 8, 2020, 5:31 p.m.