seqVCF_Header: Parse the Header of a VCF/BCF File

Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/ConvVCF2GDS.R

Description

Parses the meta-information lines of a VCF or BCF file.

Usage

1
seqVCF_Header(vcf.fn, getnum=FALSE)

Arguments

vcf.fn

the file name of VCF or BCF format; or a connection object for VCF format

getnum

if TRUE, return the total number of variants

Details

The ID description contains four columns: ID – variable name; Number – the number of elements, see the webpage of the 1000 Genomes Project; Type – data type; Description – a variable description.

Value

Return a list (with a class name "SeqVCFHeaderClass", S3 object):

fileformat

the file format

info

the ID description in the INFO field

filter

the ID description in the FILTER field

format

the ID description in the FORMAT field

alt

the ID description in the ALT field

contig

the description in the contig field

assembly

the link of assembly

reference

genome reference, or NULL if unknown

header

the other header lines

ploidy

ploidy, two for humans

num.sample

the number of samples

num.variant

the number of variants, applicable only if getnum=TRUE

sample.id

a vector of sample IDs in the VCF/BCF file

Author(s)

Xiuwen Zheng

References

Danecek, P., Auton, A., Abecasis, G., Albers, C.A., Banks, E., DePristo, M.A., Handsaker, R.E., Lunter, G., Marth, G.T., Sherry, S.T., et al. (2011). The variant call format and VCFtools. Bioinformatics 27, 2156-2158.

See Also

seqVCF_SampID, seqVCF2GDS

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
# the VCF file
(vcf.fn <- seqExampleFileName("vcf"))
# or vcf.fn <- "C:/YourFolder/Your_VCF_File.vcf"

# get sample id
seqVCF_Header(vcf.fn, getnum=TRUE)

# use a connection object
f <- file(vcf.fn, "r")
seqVCF_Header(f, getnum=TRUE)
close(f)

Example output

Loading required package: gdsfmt
[1] "/usr/lib/R/site-library/SeqArray/extdata/CEU_Exon.vcf.gz"
List of 13
 $ fileformat : chr "VCFv4.0"
 $ info       :'data.frame':	9 obs. of  6 variables:
  ..$ ID         : chr [1:9] "AA" "AC" "AN" "DP" ...
  ..$ Number     : chr [1:9] "." "1" "1" "1" ...
  ..$ Type       : chr [1:9] "String" "Integer" "Integer" "Integer" ...
  ..$ Description: chr [1:9] "Ancestral Allele" "Total number of alternate alleles in called genotypes" "Total number of alleles in called genotypes" "Total Depth" ...
  ..$ Source     : chr [1:9] NA NA NA NA ...
  ..$ Version    : chr [1:9] NA NA NA NA ...
 $ filter     :'data.frame':	2 obs. of  2 variables:
  ..$ ID         : chr [1:2] "PASS" "q10"
  ..$ Description: chr [1:2] "All filters passed" "Quality below 10"
 $ format     :'data.frame':	2 obs. of  4 variables:
  ..$ ID         : chr [1:2] "GT" "DP"
  ..$ Number     : chr [1:2] "1" "."
  ..$ Type       : chr [1:2] "String" "Integer"
  ..$ Description: chr [1:2] "Genotype" "Read Depth from MOSAIK BAM"
 $ alt        : NULL
 $ contig     : NULL
 $ assembly   : NULL
 $ reference  : chr "human_b36_both.fasta"
 $ header     :'data.frame':	0 obs. of  2 variables:
  ..$ id   : chr(0) 
  ..$ value: chr(0) 
 $ ploidy     : int 2
 $ num.sample : int 90
 $ num.variant: num 1348
 $ sample.id  : chr [1:90] "NA06984" "NA06985" "NA06986" "NA06989" ...
 - attr(*, "class")= chr "SeqVCFHeaderClass"
List of 13
 $ fileformat : chr "VCFv4.0"
 $ info       :'data.frame':	9 obs. of  6 variables:
  ..$ ID         : chr [1:9] "AA" "AC" "AN" "DP" ...
  ..$ Number     : chr [1:9] "." "1" "1" "1" ...
  ..$ Type       : chr [1:9] "String" "Integer" "Integer" "Integer" ...
  ..$ Description: chr [1:9] "Ancestral Allele" "Total number of alternate alleles in called genotypes" "Total number of alleles in called genotypes" "Total Depth" ...
  ..$ Source     : chr [1:9] NA NA NA NA ...
  ..$ Version    : chr [1:9] NA NA NA NA ...
 $ filter     :'data.frame':	2 obs. of  2 variables:
  ..$ ID         : chr [1:2] "PASS" "q10"
  ..$ Description: chr [1:2] "All filters passed" "Quality below 10"
 $ format     :'data.frame':	2 obs. of  4 variables:
  ..$ ID         : chr [1:2] "GT" "DP"
  ..$ Number     : chr [1:2] "1" "."
  ..$ Type       : chr [1:2] "String" "Integer"
  ..$ Description: chr [1:2] "Genotype" "Read Depth from MOSAIK BAM"
 $ alt        : NULL
 $ contig     : NULL
 $ assembly   : NULL
 $ reference  : chr "human_b36_both.fasta"
 $ header     :'data.frame':	0 obs. of  2 variables:
  ..$ id   : chr(0) 
  ..$ value: chr(0) 
 $ ploidy     : int NA
 $ num.sample : int 90
 $ num.variant: int 0
 $ sample.id  : chr [1:90] "NA06984" "NA06985" "NA06986" "NA06989" ...
 - attr(*, "class")= chr "SeqVCFHeaderClass"

SeqArray documentation built on Nov. 8, 2020, 5:08 p.m.