Description Details Constructors Accessors Subsetting and combining expand genotypeCodesToNucleotides(vcf, ...) SnpMatrixToVCF(from, seqSource) Variant Type Arguments Author(s) See Also Examples
The VCF class is a virtual class extended from
RangedSummarizedExperiment. The subclasses, CompressedVCF
and ExtendedVCF, are containers for holding data from
Variant Call Format files.
The VCF class is a virtual class with two concrete subclasses,
CollapsedVCF and ExtendedVCF. 
Slots unique to VCF and subclasses,
fixed: A DataFrame containing the REF, ALT, QUAL 
and FILTER fields from a VCF file.
info: A DataFrame containing the INFO fields from 
a VCF file.
Slots inherited from RangedSummarizedExperiment,
metadata: A list containing the 
file header or other information about the overall experiment.
rowRanges: A GRanges-class instance defining the
variant ranges and associated metadata columns of REF, ALT, QUAL 
and FILTER. While the REF, ALT, QUAL and FILTER fields can
be displayed as metadata columns they cannot be modified with
rowRanges<-. To modify these fields use fixed<-. 
colData: A DataFrame-class instance describing the 
samples and associated metadata.
geno: The assays slot from
RangedSummarizedExperiment has been renamed as geno
for the VCF class. This slot contains the genotype information
immediately following the FORMAT field in a VCF file. Each element
of the list or SimpleList is a matrix or array. 
It is expected that users will not create instances of the VCF class
but instead one of the concrete subclasses, CollapsedVCF or ExpandVCF.
CollapsedVCF contains the ALT data as a DNAStringSetList allowing 
for multiple alleles per variant. The ExpandedVCF ALT data is a 
DNAStringSet where the ALT column has been expanded to create a 
flat form of the data with one row per variant-allele combination. In 
the case of strucutral variants, ALT will be a CompressedCharacterList
or character in the collapsed or expanded forms.
readVcf(file, genome, param, ..., row.names=TRUE)
VCF(rowRanges = GRanges(), colData = DataFrame(), 
                exptData = list(header = VCFHeader()), 
                fixed = DataFrame(), info = DataFrame(), 
                geno = SimpleList(), ..., collapsed=TRUE, 
                verbose = FALSE)
Creates CollapsedVCF when collapsed = TRUE and an
ExpandedVCF when collapsed = FALSE.
This is a low-level constructor used internally. Most instances
of the VCF class are created with readVCF.
In the following code snippets x is a CollapsedVCF or ExpandedVCF 
object.
rowRanges(x, ..., fixed = TRUE), rowRanges(x) <- value:
Gets or sets the rowRanges. The CHROM, POS, ID, POS and REF fields are
used to create a GRanges object. The start of the ranges are
defined by POS and the width is equal to the width of the reference
allele REF.  The IDs become the rownames. If they are missing (i.e.,
‘.’) a string of CHROM:POS_REF/ALT is used instead.  The
genome argument is stored in the seqinfo of the GRanges and
can be accessed with genome(<VCF>).
When fixed = TRUE, REF, ALT, QUAL and FILTER metadata columns are 
displayed as metadata columns. To modify the fixed fields, use
the fixed<- setter. 
One metadata column, paramRangeID, is included with the
rowRanges. This ID is meaningful when multiple ranges are
specified in the ScanVcfParam and distinguishes which records
match each range.
The metadata columns of a VCF object are accessed with the following:
ref(x), ref(x) <- value:
Gets or sets the reference allele (REF). value must 
be a DNAStringSet. 
alt(x), alt(x) <- value:
Gets or sets the alternate allele data (ALT). When x is 
a CollapsedVCF, value must be a DNAStringSetList
or CompressedCharacterList. For ExpandedVCF, value
must be a DNAStringSet or character.
qual(x), qual(x) <- value:
Returns or sets the quality scores (QUAL). value must 
be an numeric(1L).
filt(x), filt(x) <- value:
Returns or sets the filter data. value must 
be a character(1L). Names must be one of 'REF', 'ALT',
'QUAL' or 'FILTER'.
mcols(x), mcols(x) <- value:
These methods behave the same as mcols(rowRanges(x)) and 
mcols(rowRanges(x)) <- value. This method does not manage the
fixed fields, 'REF', 'ALT', 'QUAL' or 'FILTER'. To modify those
columns use fixed<-.
fixed(x), fixed(x) <- value:
Gets or sets a DataFrame of REF, ALT, QUAL and FILTER only. 
Note these fields are displayed as metadata columns with
the rowRanges() data (set to fixed = FALSE to suppress).
info(x, ..., row.names = TRUE), info(x) <- value:
Gets or sets a DataFrame of INFO variables. Row names are added
if unique and row.names=TRUE.
geno(x, withDimnames=TRUE), geno(x) <- value:
oets a SimpleList of genotype data.
value is a SimpleList. To replace a single variable in
the SimpleList use geno(x)$variable <- value; in this 
case value must be a matrix or array. By default
row names are returned; to override specify
geno(vcf, withDimnames=FALSE).
metadata(x):
Gets a list of experiment-related data. By default this
list includes the ‘header’ information from the VCF file. 
See the use of header() for details in extracting
header information. 
colData(x), colData(x) <- value:
Gets or sets a DataFrame of sample-specific information. Each row 
represents a sample in the VCF file. value must be a 
DataFrame with rownames representing the samples in the VCF 
file.
genome(x):
Extract the genome information from the GRanges object
returned by the rowRanges accessor.
seqlevels(x):
Extract the seqlevels from the GRanges object
returned by the rowRanges accessor.
strand(x):
Extract the strand from the GRanges object
returned by the rowRanges accessor.
header(x), header(x)<- value:
Get or set the VCF header information. Replacement value
must be a VCFHeader object. To modify individual elements 
use info<-, geno<- or meta<- on a 
‘VCFHeader’ object. See ?VCFHeader man page for
details.
info(header(x))
geno(header(x))
meta(header(x))
samples(header(x))
vcfFields(x)
Returns a CharacterList of all available VCF
fields, with names of fixed, info, geno and
samples indicating the four categories. Each element is a
character() vector of available VCF field names within each category. 
In the following code x is a VCF object, and ... is a list
of VCF objects.
x[i, j], x[i, j] <- value: Gets or sets rows and columns.
i and j can be integer or logical vectors. value is a
replacement VCF object.
subset(x, subset, select, ...): Restricts x by
evaluating the subset argument in the scope of
rowData(x) and info(x), and select in the
context of colData(x). The subset argument restricts
by rows, while the select argument restricts by column. The
... are passed to the underlying subset() calls.
cbind(...), rbind(...): cbind combines objects with
identical ranges (rowRanges) but different samples (columns in
assays). The colnames in colData must match or an error is
thrown. Columns with duplicate names in fixed, info and
mcols(rowRanges(VCF)) must contain the same data.
rbind combines objects with different ranges (rowRanges) and
the same subjects (columns in assays). Columns with duplicate names
in colData must contain the same data.  The ‘Samples’
columns in colData (created by readVcf) are renamed with a
numeric extension ordered as they were input to rbind e.g.,
“Samples.1, Samples.2, ...” etc. 
metadata from all objects are combined into a
list with no name checking.
In the following code snippets x is a CollapsedVCF object.
expand(x, ..., row.names = FALSE):
Expand (unlist) the ALT column of a CollapsedVCF object to one row 
per ALT value. Variables with Number='A' have one value per alternate
allele and are expanded accordingly. The 'AD' genotype field 
(and any variables with 'Number' set to 'R')
is expanded into REF/ALT pairs. For all other fields, the rows
are replicated to match the elementNROWS of ALT.
The output is an ExpandedVCF with ALT as a DNAStringSet 
or character (structural variants). By default rownames
are NULL. When row.names=TRUE the expanded output has 
duplicated rownames corresponding to the original x.
This function converts the 'GT' genotype codes in a VCF object
to nucleotides. See also ?readGT to read in only 'GT' data as 
codes or nucleotides.
This function converts the output from the read.plink 
function to a VCF class. from must be a list of length 3
with named elements "map", "fam" and "genotypes". seqSource can
be a BSgenome or an FaFile
used for reference sequence extraction.
Functions to identify variant type include isSNV, 
isInsertion, isDeletion, isIndel, 
isSubstitution and isTransition. See the ?isSNV 
man page for details.
A list or SimpleList of matrix elements,
or a matrix containing the genotype information from a
VCF file. If present, these data immediately follow the FORMAT
field in the VCF. 
Each element of the list must have the same dimensions, and dimension 
names (if present) must be consistent across elements and with the row 
names of rowRanges, colData. 
A DataFrame of data from the INFO field of a VCF file. 
The number of rows must match that in the rowRanges object.
A DataFrame of REF, ALT, QUAL and FILTER 
fields from a VCF file. The number of rows must match that
of the rowRanges object.
A GRanges instance describing the ranges of
interest. 
Row names, if present, become the row names of the VCF. The length 
of the GRanges must equal the number of rows of the matrices in 
geno.
A DataFrame describing the samples. Row names, if 
present, become the column names of the VCF.
A list describing the header of the VCF file or 
additional information for the overall experiment. 
For cbind and rbind a list of VCF objects.
For all other methods ... are additional arguments passed to methods. 
A logical(1) indicating whether a 
CollapsedVCF or ExpandedVCF should be created. The ALT in a
CollapsedVCF is a DNAStringSetList while in a
ExpandedVCF it is a DNAStringSet. 
A logical(1) indicating whether messages
about data coercion during construction should be printed.
Valerie Obenchain
GRanges,
DataFrame,
SimpleList,
RangedSummarizedExperiment,
readVcf,
writeVcf
isSNV
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 | ## readVcf() parses data into a VCF object: 
fl <- system.file("extdata", "structural.vcf", package="VariantAnnotation")
vcf <- readVcf(fl, genome="hg19")
## ----------------------------------------------------------------
## Accessors 
## ----------------------------------------------------------------
## Variant locations are stored in the GRanges object returned by
## the rowRanges() accessor.
rowRanges(vcf)
## Suppress fixed fields:
rowRanges(vcf, fixed=FALSE)
## Individual fields can be extracted with ref(), alt(), qual(), filt() etc.
qual(vcf)
ref(vcf)
head(info(vcf))
## All available VCF field names can be contracted with vcfFields(). 
vcfFields(vcf)
## Extract genotype fields with geno(). Access specific fields with 
## '$' or '[['.
geno(vcf)
identical(geno(vcf)$GQ, geno(vcf)[[2]])
## ----------------------------------------------------------------
## Renaming seqlevels and subsetting 
## ----------------------------------------------------------------
## Overlap and matching operations require that the objects
## being compared have the same seqlevels (chromosome names).
## It is often the case that the seqlevesls in on of the objects
## needs to be modified to match the other. In this VCF, the 
## seqlevels are numbers instead of preceded by "chr" or "ch". 
seqlevels(vcf)
## Rename the seqlevels to start with 'chr'.
vcf2 <- renameSeqlevels(vcf, paste0("chr", seqlevels(vcf))) 
seqlevels(vcf2)
## The VCF can also be subset by seqlevel using 'keepSeqlevels'
## or 'dropSeqlevels'. See ?keepSeqlevels for details. 
vcf3 <- keepSeqlevels(vcf2, "chr2", pruning.mode="coarse")
seqlevels(vcf3)
## ----------------------------------------------------------------
## Header information 
## ----------------------------------------------------------------
## Header data can be modified in the 'meta', 'info' and 'geno'
## slots of the VCFHeader object. See ?VCFHeader for details.
## Current 'info' fields.
rownames(info(header(vcf)))
## Add a new field to the header.
newInfo <- DataFrame(Number=1, Type="Integer",
                     Description="Number of Samples With Data",
                     row.names="NS")
info(header(vcf)) <- rbind(info(header(vcf)), newInfo)
rownames(info(header(vcf)))
## ----------------------------------------------------------------
## Collapsed and Expanded VCF 
## ----------------------------------------------------------------
## readVCF() produces a CollapsedVCF object.
fl <- system.file("extdata", "ex2.vcf", 
                  package="VariantAnnotation")
vcf <- readVcf(fl, genome="hg19")
vcf
## The ALT column is a DNAStringSetList to allow for more
## than one alternate allele per variant.
alt(vcf)
## For structural variants ALT is a CharacterList.
fl <- system.file("extdata", "structural.vcf", 
                  package="VariantAnnotation")
vcf <- readVcf(fl, genome="hg19")
alt(vcf)
## ExpandedVCF is the 'flattened' counterpart of CollapsedVCF.
## The ALT and all variables with Number='A' in the header are
## expanded to one row per alternate allele.
vcfLong <- expand(vcf)
alt(vcfLong)
## Also see the ?VRanges class for an alternative form of
## 'flattened' VCF data.
## ----------------------------------------------------------------
## isSNV()
## ----------------------------------------------------------------
## isSNV() returns a subset VCF containing SNVs only.
vcf <- VCF(rowRanges = GRanges("chr1", IRanges(1:4*3, width=c(1, 2, 1, 1))))
alt(vcf) <- DNAStringSetList("A", c("TT"), c("G", "A"), c("TT", "C"))
ref(vcf) <- DNAStringSet(c("G", c("AA"), "T", "G"))
fixed(vcf)[c("REF", "ALT")]
## SNVs are present in rows 1 (single ALT value), 3 (both ALT values) 
## and 4 (1 of the 2 ALT values).
vcf[isSNV(vcf, singleAltOnly=TRUE)] 
vcf[isSNV(vcf, singleAltOnly=FALSE)] ## all 3 SNVs
 | 
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.