seqGDS2VCF: Convert to a VCF File

View source: R/Conversion.R

seqGDS2VCFR Documentation

Convert to a VCF File

Description

Converts a SeqArray GDS file to a Variant Call Format (VCF) file.

Usage

seqGDS2VCF(gdsfile, vcf.fn, info.var=NULL, fmt.var=NULL, chr_prefix="",
    use_Rsamtools=TRUE, verbose=TRUE)

Arguments

gdsfile

a SeqVarGDSClass object

vcf.fn

the file name, output a file of VCF format; or a connection object

info.var

a list of variable names in the INFO field, or NULL for using all variables; character(0) for no variable in the INFO field

fmt.var

a list of variable names in the FORMAT field, or NULL for using all variables; character(0) for no variable in the FORMAT field

chr_prefix

the prefix of chromosome, e.g., "chr"; no prefix by default

use_Rsamtools

TRUE for loading the Rsamtools package, see details

verbose

if TRUE, show information

Details

seqSetFilter can be used to define a subset of data for the export.

If the filename extension is "gz" or "bgz", the gzip compression algorithm will be used to compress the output data. When the Rsamtools package is installed and use_Rsamtools=TRUE, the exported file utilizes the bgzf format (bgzip, a variant of gzip format) allowing for fast indexing. bzfile or xzfile will be used, if the filename extension is "bz" or "xz".

Value

Return the file name of VCF file with an absolute path.

Author(s)

Xiuwen Zheng

References

Danecek, P., Auton, A., Abecasis, G., Albers, C.A., Banks, E., DePristo, M.A., Handsaker, R.E., Lunter, G., Marth, G.T., Sherry, S.T., et al. (2011). The variant call format and VCFtools. Bioinformatics 27, 2156-2158.

See Also

seqVCF2GDS

Examples

# the GDS file
(gds.fn <- seqExampleFileName("gds"))

# display
(f <- seqOpen(gds.fn))

# output the first 10 samples
samp.id <- seqGetData(f, "sample.id")
seqSetFilter(f, sample.id=samp.id[1:5])


# convert
seqGDS2VCF(f, "tmp.vcf.gz")

# no INFO and FORMAT
seqGDS2VCF(f, "tmp1.vcf.gz", info.var=character(), fmt.var=character())

# output BN,GP,AA,DP,HM2 in INFO (the variables are in this order), no FORMAT
seqGDS2VCF(f, "tmp2.vcf.gz", info.var=c("BN","GP","AA","DP","HM2"),
    fmt.var=character())


# read
(txt <- readLines("tmp.vcf.gz", n=20))
(txt <- readLines("tmp1.vcf.gz", n=20))
(txt <- readLines("tmp2.vcf.gz", n=20))





#########################################################################
# Users could compare the new VCF file with the original VCF file
# call "diff" in Unix (a command line tool comparing files line by line)

# using all samples and variants
seqResetFilter(f)

# convert
seqGDS2VCF(f, "tmp.vcf.gz")


# file.copy(seqExampleFileName("vcf"), "old.vcf.gz", overwrite=TRUE)
# system("diff <(gunzip -c old.vcf.gz) <(gunzip -c tmp.vcf.gz)")

# 1a2,3
# > ##fileDate=20130309
# > ##source=SeqArray_RPackage_v1.0

# LOOK GOOD!


# delete temporary files
unlink(c("tmp.vcf.gz", "tmp1.vcf.gz", "tmp2.vcf.gz"))

# close the GDS file
seqClose(f)

zhengxwen/SeqArray documentation built on Jan. 10, 2025, 9:09 p.m.