Import Variant Call Format (VCF) files in text or binary format
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
scanVcfHeader(file, ...) ## S4 method for signature 'character' scanVcfHeader(file, ...) scanVcf(file, ..., param) ## S4 method for signature 'character,ScanVcfParam' scanVcf(file, ..., param) ## S4 method for signature 'character,missing' scanVcf(file, ..., param) ## S4 method for signature 'connection,missing' scanVcf(file, ..., param) ## S4 method for signature 'TabixFile' scanVcfHeader(file, ...) ## S4 method for signature 'TabixFile,missing' scanVcf(file, ..., param) ## S4 method for signature 'TabixFile,ScanVcfParam' scanVcf(file, ..., param) ## S4 method for signature 'TabixFile,GRanges' scanVcf(file, ..., param) ## S4 method for signature 'TabixFile,IntegerRangesList' scanVcf(file, ..., param)
A instance of
Additional arguments for methods
param allows portions of the file to be input, but
requires that the file be bgzip'd and indexed as a
file="connection" scan the entire file. With
file="connection", an argument
n indicates the number of
lines of the VCF file to input; a connection open at the beginning of
the call is open and incremented by
n lines at the end of the
call, providing a convenient way to stream through large VCF files.
The INFO field of the scanned VCF file is returned as a single ‘packed’ vector, as in the VCF file. The GENO field is a list of matrices, each matrix corresponds to a field as defined in the FORMAT field of the VCF header. Each matrix has as many rows as scanned in the VCF file, and as many columns as there are samples. As with the INFO field, the elements of the matrix are ‘packed’. The reason that INFO and GENO are returned packed is to facilitate manipulation, e.g., selecting particular rows or samples in a consistent manner across elements.
scanVcfHeader returns a
VCFHeader object with
header information parsed into five categories,
can be accessed with a ‘getter’ of the same name
(e.g., info(<VCFHeader>)). If the file header has multiple rows
with the same name (e.g., 'source') the row names of the DataFrame
are made unique in the usual way, 'source', 'source.1' etc.
scanVcf returns a list, with one element per range. Each list
has 7 elements, obtained from the columns of the VCF specification:
GRanges instance derived from
ID, and the width of
phred-scaled quality score for the assertion made in ALT
indicator of whether or not the position passed all filters applied
genotype information immediately following the FORMAT field in the VCF
GENO element is itself a list, with elements corresponding
to those defined in the VCF file header. For
of GENO are returned as a matrix of records x samples; if the
description of the element in the file header indicated multiplicity
other than 1 (e.g., variable number for “A”, “G”, or
“.”), then each entry in the matrix is a character string with
Martin Morgan and Valerie Obenchain>
http://vcftools.sourceforge.net/specs.html outlines the VCF specification.
information on the portion of the specification implemented by
http://samtools.sourceforge.net/ provides information on
1 2 3 4 5 6 7
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.