seqStorageOption: Storage and Compression Options

View source: R/Utilities.R

seqStorageOptionR Documentation

Storage and Compression Options

Description

Storage and compression options for GDS import and merging.

Usage

seqStorageOption(compression=c("ZIP_RA", "ZIP_RA.fast", "ZIP_RA.max", "LZ4_RA",
    "LZ4_RA.fast", "LZ4_RA.max", "LZMA_RA", "LZMA_RA.fast", "LZMA_RA.max",
    "Ultra", "UltraMax", "none"), mode=NULL, float.mode="float32",
    geno.compress=NULL, info.compress=NULL, format.compress=NULL,
    index.compress=NULL, ...)

Arguments

compression

the default compression level ("ZIP_RA"), see add.gdsn for the description of compression methods

mode

a character vector, specifying storage type for corresponding variable, e.g., c('annotation/info/HM'="int16", 'annotation/format/PL'="int")

float.mode

specify the storage mode for read numbers, e.g., "float32", "float64", "packedreal16"; the additional parameters can follow by colon, like "packedreal16:scale=0.0001"

geno.compress

NULL for the default value, or the compression method for genotypic data

info.compress

NULL for the default value, or the compression method for data sets stored in the INFO field (i.e., "annotation/info")

format.compress

NULL for the default value, or the compression method for data sets stored in the FORMAT field (i.e., "annotation/format")

index.compress

NULL for the default value, or the compression method for data index variables (e.g., "annotation/info/@HM")

...

other specified storage compression for corresponding variable, e.g., 'annotation/info/HM'="ZIP_MAX"

Details

The compression modes "Ultra" and "UltraMax" attempt to maximize the compression ratio using gigabyte-sized or even terabyte-sized virtual memory, according to "LZMA_RA.ultra" and "LZMA_RA.ultra_max" in compression.gdsn. These features require gdsfmt (>=v1.16.0). "Ultra" and "UltraMax" may not increase the compression ratio much compared with "LZMA_RA.max", and these options are designed for the users who want to exhaust the computational resources.

Value

Return a list with a class name "SeqGDSStorageClass", contains the compression algorithm for each data type.

Author(s)

Xiuwen Zheng

See Also

seqVCF2GDS, seqRecompress, seqMerge

Examples

# the file of VCF
(vcf.fn <- seqExampleFileName("vcf"))

# convert
seqVCF2GDS(vcf.fn, "tmp1.gds", storage.option=seqStorageOption())
(f1 <- seqOpen("tmp1.gds"))

# convert (maximize the compression ratio)
seqVCF2GDS(vcf.fn, "tmp2.gds", storage.option=seqStorageOption("ZIP_RA.max"))
(f2 <- seqOpen("tmp2.gds"))

# does not compress the genotypic data
seqVCF2GDS(vcf.fn, "tmp3.gds", storage.option=
    seqStorageOption("ZIP_RA", geno.compress=""))
(f3 <- seqOpen("tmp3.gds"))

# compress with LZ4
seqVCF2GDS(vcf.fn, "tmp4.gds", storage.option=seqStorageOption("LZ4_RA"))
(f4 <- seqOpen("tmp4.gds"))


# close and remove the files
seqClose(f1)
seqClose(f2)
seqClose(f3)
seqClose(f4)

unlink(c("tmp1.gds", "tmp2.gds", "tmp3.gds", "tmp4.gds"))

zhengxwen/SeqArray documentation built on Jan. 10, 2025, 9:09 p.m.