snpgdsCombineGeno: Merge SNP datasets
In zhengxwen/SNPRelate: Parallel Computing Toolset for Relatedness and Principal Component Analysis of SNP Data

snpgdsCombineGeno

R Documentation

Merge SNP datasets

Description

To merge GDS files of SNP genotypes into a single GDS file

Usage

snpgdsCombineGeno(gds.fn, out.fn, method=c("position", "exact"),
    compress.annotation="ZIP_RA.MAX", compress.geno="ZIP_RA",
    same.strand=FALSE, snpfirstdim=FALSE, verbose=TRUE)

Arguments

`gds.fn`	a character vector of GDS file names to be merged
`out.fn`	the name of output GDS file
`method`	`"exact"`: matching by all snp.id, chromosomes, positions and alleles; `"position"`: matching by chromosomes and positions
`compress.annotation`	the compression method for the variables except `genotype`
`compress.geno`	the compression method for the variable `genotype`
`same.strand`	if TRUE, assuming the alleles on the same strand
`snpfirstdim`	if TRUE, genotypes are stored in the individual-major mode, (i.e, list all SNPs for the first individual, and then list all SNPs for the second individual, etc)
`verbose`	if TRUE, show information

Details

This function calls snpgdsSNPListIntersect internally to determine the common SNPs. Allele definitions are taken from the first GDS file.

Value

None.

Author(s)

Xiuwen Zheng

Examples

# get the file name of a gds file
fn <- snpgdsExampleFileName()

f <- snpgdsOpen(fn)
samp_id <- read.gdsn(index.gdsn(f, "sample.id"))
snp_id <- read.gdsn(index.gdsn(f, "snp.id"))
geno <- read.gdsn(index.gdsn(f, "genotype"), start=c(1,1), count=c(-1, 3000))
snpgdsClose(f)


# split the GDS file with different samples
snpgdsCreateGenoSet(fn, "t1.gds", sample.id=samp_id[1:10],
    snp.id=snp_id[1:3000])
snpgdsCreateGenoSet(fn, "t2.gds", sample.id=samp_id[11:30],
    snp.id=snp_id[1:3000])

# combine with different samples
snpgdsCombineGeno(c("t1.gds", "t2.gds"), "test.gds", same.strand=TRUE)
f <- snpgdsOpen("test.gds")
g <- read.gdsn(index.gdsn(f, "genotype"))
snpgdsClose(f)

identical(geno[1:30, ], g)  # TRUE


# split the GDS file with different SNPs
snpgdsCreateGenoSet(fn, "t1.gds", snp.id=snp_id[1:100])
snpgdsCreateGenoSet(fn, "t2.gds", snp.id=snp_id[101:300])

# combine with different SNPs
snpgdsCombineGeno(c("t1.gds", "t2.gds"), "test.gds")
f <- snpgdsOpen("test.gds")
g <- read.gdsn(index.gdsn(f, "genotype"))
snpgdsClose(f)

identical(geno[, 1:300], g)  # TRUE


# delete the temporary files
unlink(c("t1.gds", "t2.gds", "t3.gds", "t4.gds", "test.gds"), force=TRUE)

zhengxwen/SNPRelate documentation built on Nov. 19, 2024, 1:02 p.m.