Information From VCF Files
These functions help to extract information from VCF files and to
select which loci to read with
1 2 3 4 5 6 7 8 9 10
file name of the VCF file.
a character specifying the information to be extracted (see details).
the size of data in bytes read at once.
a logical: should the progress of the operation be printed?
an object of class
integer values giving the range of position values.
a numerical value indicating the minimum value of quality for selecting loci.
a logical. By default,
further arguments passed to and from other methods.
The variant call format (VCF) is described in details in the References. Roughly, a VCF file is made of two parts: the header and the genotypes. The last line of the header gives the labels of the genotypes: the first nine columns give information for each locus and are (always) "CHROM", "POS", "ID", "REF", "ALT", "QUAL", "FILTER", "INFO", and "FORMAT". The subsequent columns give the labels (identifiers) of the individuals; these may be missing if the file records only the variants. Note that the data are arranged as the transpose of the usual way: the individuals are as columns and the loci are as rows.
VCFloci is the main function documented here: it reads the
information relative to each locus. The option
which column(s) to read. By default, all of them are read. If the user
is interested in only the locus positions, the option
"POS" would be used.
Since VCF files can be very big, the data are read in portions of
chunk.size bytes. The default (1 Gb) should be appropriate in
most situations. This value should not exceed 2e9.
VCFheader returns the header of the VCF file (excluding the
line of labels).
VCFlabels returns the individual labels.
The output of
VCFloci is a data frame with as many rows as
there are loci in the VCF file and storing the requested
information. The other functions help to extract specific information
from this data frame: their outputs may then be used to select which
loci to read with
is.snp tests whether each locus is a SNP (i.e., the reference
allele, REF, is a single charater and the alternative allele, ALT,
also). It returns a logical vector with as many values as there are
loci. Note that some VCF files have the information VT (variant type)
in the INFO column.
selectQUAL select some loci with respect to
values of position or quality. They return the indices (i.e., row
numbers) of the loci satisfying the conditions.
getINFO extracts a specific information from the INFO
column. By default, these are the total depths (DP) which can be
changed with the option
what. The meaning of these information
should be described in the header of the VCF file.
VCFloci returns an object of class
"VCFinfo" which is a
data frame with a specific print method.
VCFheader returns a single character string which can be
printed nicely with
VCFlabels returns a vector of mode character.
is.snp returns a vector of mode logical.
selectQUAL return a vector of mode
getINFO returns a vector of mode character or numeric (see above).
VCFloci is able to read either compressed (*.gz) or
## see ?read.vcf