snpgdsCreateGeno: Create a SNP genotype dataset from a matrix

Description Usage Arguments Details Value Author(s) See Also Examples

View source: R/AllUtilities.R

Description

To create a GDS file of genotypes from a matrix.

Usage

1
2
3
snpgdsCreateGeno(gds.fn, genmat, sample.id=NULL, snp.id=NULL, snp.rs.id=NULL,
    snp.chromosome=NULL, snp.position=NULL, snp.allele=NULL, snpfirstdim=TRUE,
    compress.annotation="ZIP_RA.max", compress.geno="", other.vars=NULL)

Arguments

gds.fn

the file name of gds

genmat

a matrix of genotypes

sample.id

the sample ids, which should be unique

snp.id

the SNP ids, which should be unique

snp.rs.id

the rs ids for SNPs, which can be not unique

snp.chromosome

the chromosome indices

snp.position

the SNP positions in basepair

snp.allele

the reference/non-reference alleles

snpfirstdim

if TRUE, genotypes are stored in the individual-major mode, (i.e, list all SNPs for the first individual, and then list all SNPs for the second individual, etc)

compress.annotation

the compression method for the variables except genotype

compress.geno

the compression method for the variable genotype

other.vars

a list object storing other variables

Details

There are possible values stored in the variable genmat: 0, 1, 2 and other values. “0” indicates two B alleles, “1” indicates one A allele and one B allele, “2” indicates two A alleles, and other values indicate a missing genotype.

If snpfirstdim is TRUE, then genmat should be “# of SNPs X # of samples”; if snpfirstdim is FALSE, then genmat should be “# of samples X # of SNPs”.

The typical variables specified in other.vars are “sample.annot” and “snp.annot”, which are data.frame objects.

Value

None.

Author(s)

Xiuwen Zheng

See Also

snpgdsCreateGenoSet, snpgdsCombineGeno

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
# load data
data(hapmap_geno)

# create a gds file
with(hapmap_geno, snpgdsCreateGeno("test.gds", genmat=genotype,
    sample.id=sample.id, snp.id=snp.id, snp.chromosome=snp.chromosome,
    snp.position=snp.position, snp.allele=snp.allele, snpfirstdim=TRUE))

# open the gds file
genofile <- snpgdsOpen("test.gds")

RV <- snpgdsPCA(genofile)
plot(RV$eigenvect[,2], RV$eigenvect[,1], xlab="PC 2", ylab="PC 1")

# close the file
snpgdsClose(genofile)

Example output

Loading required package: gdsfmt
SNPRelate -- supported by Streaming SIMD Extensions 2 (SSE2)
Principal Component Analysis (PCA) on genotypes:
Excluding 42 SNPs on non-autosomes
Excluding 0 SNP (monomorphic: TRUE, MAF: NaN, missing rate: NaN)
Working space: 279 samples, 958 SNPs
    using 1 (CPU) core
PCA:    the sum of all selected genotypes (0,1,2) = 264760
CPU capabilities: Double-Precision SSE2
Fri Nov 30 13:33:14 2018    (internal increment: 408)

[..................................................]  0%, ETC: ---    
[==================================================] 100%, completed in 0s
Fri Nov 30 13:33:14 2018    Begin (eigenvalues and eigenvectors)
Fri Nov 30 13:33:14 2018    Done.

SNPRelate documentation built on Nov. 8, 2020, 5:31 p.m.