seqOptimize: Optimize the Storage of Data Array

View source: R/Utilities.R

seqOptimizeR Documentation

Optimize the Storage of Data Array

Description

Transpose data array or matrix for possibly higher-speed access.

Usage

seqOptimize(gdsfn, target=c("chromosome", "by.sample"), format.var=TRUE,
    cleanup=TRUE, verbose=TRUE)

Arguments

gdsfn

the file name of GDS

target

"chromosome", "by.sample"; see details

format.var

a character vector for selected variable names, or TRUE for all variables, according to "annotation/format"

cleanup

call link{cleanup.gds} if TRUE

verbose

if TRUE, show information

Details

"chromosome": adding or updating two additional nodes '@chrom_rle_val' and '@chrom_rle_len' for faster chromosome indexing, requiring SeqArray>=v1.20.0.

"by.sample": optimizing GDS file for seqApply(..., margin="by.sample"). Warning: optimizing GDS file for reading data by sample may increase file size by up to 2X as genotype data and all format data are duplicated.

Value

None.

Author(s)

Xiuwen Zheng

See Also

seqGetData, seqApply

Examples

# the file name of VCF
(vcf.fn <- seqExampleFileName("vcf"))
# or vcf.fn <- "C:/YourFolder/Your_VCF_File.vcf"

# convert
seqVCF2GDS(vcf.fn, "tmp.gds", storage.option="ZIP_RA")

# prepare data for the SeqVarTools package
seqOptimize("tmp.gds", target="by.sample")


# list the structure of GDS variables
(f <- seqOpen("tmp.gds"))
# close
seqClose(f)


# delete the temporary file
unlink("tmp.gds")

zhengxwen/SeqArray documentation built on Nov. 19, 2024, 1:04 p.m.