seqMerge: Merge Multiple SeqArray GDS Files

View source: R/UtilsMerge.R

seqMergeR Documentation

Merge Multiple SeqArray GDS Files

Description

Merges multiple SeqArray GDS files.

Usage

seqMerge(gds.fn, out.fn, storage.option="LZMA_RA", info.var=NULL, fmt.var=NULL,
    samp.var=NULL, optimize=TRUE, digest=TRUE, geno.pad=TRUE, verbose=TRUE)

Arguments

gds.fn

the file names of multiple GDS files

out.fn

the output file name

storage.option

specify the storage and compression option, "ZIP_RA" (seqStorageOption("ZIP_RA")); or "LZMA_RA" to use LZMA compression algorithm with higher compression ratio (by default)

info.var

characters, the variable name(s) in the INFO field; NULL for all variables, or character() excludes all INFO variables

fmt.var

characters, the variable name(s) in the FORMAT field; NULL for all variables, or character() excludes all FORMAT variables

samp.var

characters, the variable name(s) in 'sample.annotation'; or NULL for all variables

optimize

if TRUE, optimize the access efficiency by calling cleanup.gds

digest

a logical value (TRUE/FALSE) or a character ("md5", "sha1", "sha256", "sha384" or "sha512"); add md5 hash codes to the GDS file if TRUE or a digest algorithm is specified

geno.pad

TRUE, pad a 2-bit genotype array in bytes to avoid recompressing genotypes if possible

verbose

if TRUE, show information

Details

The function merges multiple SeqArray GDS files. Users can specify the compression method and level for the new GDS file. If gds.fn contains one file, users can change the storage type to create a new file.

WARNING: the functionality of seqMerge() is limited.

Value

Return the file name of GDS format with an absolute path.

Author(s)

Xiuwen Zheng

See Also

seqVCF2GDS, seqExport

Examples

# the VCF file
vcf.fn <- seqExampleFileName("vcf")

# the number of variants
total.count <- seqVCF_Header(vcf.fn, getnum=TRUE)$num.variant

split.cnt <- 5
start <- integer(split.cnt)
count <- integer(split.cnt)

s <- (total.count+1) / split.cnt
st <- 1L
for (i in 1:split.cnt)
{
    z <- round(s * i)
    start[i] <- st
    count[i] <- z - st
    st <- z
}

fn <- paste0("tmp", 1:split.cnt, ".gds")

# convert to 5 gds files
for (i in 1:split.cnt)
{
    seqVCF2GDS(vcf.fn, fn[i], storage.option="ZIP_RA",
        start=start[i], count=count[i])
}

# merge different variants
seqMerge(fn, "tmp.gds", storage.option="ZIP_RA")
seqSummary("tmp.gds")


####  merging different samples  ####

vcf.fn <- seqExampleFileName("gds")
file.copy(vcf.fn, "test.gds", overwrite=TRUE)

# modify 'sample.id'
seqAddValue("test.gds", "sample.id", paste0("S", 1:90), replace=TRUE)

# merging
seqMerge(c(vcf.fn, "test.gds"), "output.gds", storage.option="ZIP_RA")


# delete the temporary files
unlink(c("tmp.gds", "test.gds", "output.gds"), force=TRUE)
unlink(fn, force=TRUE)

zhengxwen/SeqArray documentation built on Jan. 10, 2025, 9:09 p.m.