Description Details Constructors Accessors Subsetting and combining expand genotypeCodesToNucleotides(vcf, ...) SnpMatrixToVCF(from, seqSource) Variant Type Arguments Author(s) See Also Examples
The VCF class is a virtual class extended from
RangedSummarizedExperiment
. The subclasses, CompressedVCF
and ExtendedVCF
, are containers for holding data from
Variant Call Format files.
The VCF
class is a virtual class with two concrete subclasses,
CollapsedVCF
and ExtendedVCF
.
Slots unique to VCF
and subclasses,
fixed
: A DataFrame containing the REF, ALT, QUAL
and FILTER fields from a VCF file.
info
: A DataFrame containing the INFO fields from
a VCF file.
Slots inherited from RangedSummarizedExperiment
,
metadata
: A list
containing the
file header or other information about the overall experiment.
rowRanges
: A GRanges-class instance defining the
variant ranges and associated metadata columns of REF, ALT, QUAL
and FILTER. While the REF, ALT, QUAL and FILTER fields can
be displayed as metadata columns they cannot be modified with
rowRanges<-
. To modify these fields use fixed<-
.
colData
: A DataFrame-class instance describing the
samples and associated metadata.
geno
: The assays
slot from
RangedSummarizedExperiment
has been renamed as geno
for the VCF class. This slot contains the genotype information
immediately following the FORMAT field in a VCF file. Each element
of the list
or SimpleList
is a matrix or array.
It is expected that users will not create instances of the VCF class
but instead one of the concrete subclasses, CollapsedVCF or ExpandVCF.
CollapsedVCF contains the ALT data as a DNAStringSetList
allowing
for multiple alleles per variant. The ExpandedVCF ALT data is a
DNAStringSet
where the ALT column has been expanded to create a
flat form of the data with one row per variant-allele combination. In
the case of strucutral variants, ALT will be a CompressedCharacterList
or character
in the collapsed or expanded forms.
readVcf(file, genome, param, ..., row.names=TRUE)
VCF(rowRanges = GRanges(), colData = DataFrame(),
exptData = list(header = VCFHeader()),
fixed = DataFrame(), info = DataFrame(),
geno = SimpleList(), ..., collapsed=TRUE,
verbose = FALSE)
Creates CollapsedVCF when collapsed = TRUE
and an
ExpandedVCF when collapsed = FALSE
.
This is a low-level constructor used internally. Most instances
of the VCF
class are created with readVCF
.
In the following code snippets x
is a CollapsedVCF or ExpandedVCF
object.
rowRanges(x, ..., fixed = TRUE)
, rowRanges(x) <- value
:
Gets or sets the rowRanges. The CHROM, POS, ID, POS and REF fields are
used to create a GRanges
object. The start of the ranges are
defined by POS and the width is equal to the width of the reference
allele REF. The IDs become the rownames. If they are missing (i.e.,
‘.’) a string of CHROM:POS_REF/ALT is used instead. The
genome
argument is stored in the seqinfo of the GRanges
and
can be accessed with genome(<VCF>)
.
When fixed = TRUE
, REF, ALT, QUAL and FILTER metadata columns are
displayed as metadata columns. To modify the fixed
fields, use
the fixed<-
setter.
One metadata column, paramRangeID
, is included with the
rowRanges
. This ID is meaningful when multiple ranges are
specified in the ScanVcfParam
and distinguishes which records
match each range.
The metadata columns of a VCF object are accessed with the following:
ref(x)
, ref(x) <- value
:
Gets or sets the reference allele (REF). value
must
be a DNAStringSet
.
alt(x)
, alt(x) <- value
:
Gets or sets the alternate allele data (ALT). When x
is
a CollapsedVCF, value
must be a DNAStringSetList
or CompressedCharacterList
. For ExpandedVCF, value
must be a DNAStringSet
or character
.
qual(x)
, qual(x) <- value
:
Returns or sets the quality scores (QUAL). value
must
be an numeric(1L)
.
filt(x)
, filt(x) <- value
:
Returns or sets the filter data. value
must
be a character(1L)
. Names must be one of 'REF', 'ALT',
'QUAL' or 'FILTER'.
mcols(x)
, mcols(x) <- value
:
These methods behave the same as mcols(rowRanges(x))
and
mcols(rowRanges(x)) <- value
. This method does not manage the
fixed fields, 'REF', 'ALT', 'QUAL' or 'FILTER'. To modify those
columns use fixed<-
.
fixed(x)
, fixed(x) <- value
:
Gets or sets a DataFrame of REF, ALT, QUAL and FILTER only.
Note these fields are displayed as metadata columns with
the rowRanges() data (set to fixed = FALSE to suppress).
info(x, ..., row.names = TRUE)
, info(x) <- value
:
Gets or sets a DataFrame of INFO variables. Row names are added
if unique and row.names=TRUE
.
geno(x, withDimnames=TRUE)
, geno(x) <- value
:
oets a SimpleList of genotype data.
value
is a SimpleList. To replace a single variable in
the SimpleList use geno(x)$variable <- value
; in this
case value
must be a matrix or array. By default
row names are returned; to override specify
geno(vcf, withDimnames=FALSE)
.
metadata(x)
:
Gets a list
of experiment-related data. By default this
list includes the ‘header’ information from the VCF file.
See the use of header()
for details in extracting
header information.
colData(x)
, colData(x) <- value
:
Gets or sets a DataFrame
of sample-specific information. Each row
represents a sample in the VCF file. value
must be a
DataFrame
with rownames representing the samples in the VCF
file.
genome(x)
:
Extract the genome
information from the GRanges
object
returned by the rowRanges
accessor.
seqlevels(x)
:
Extract the seqlevels
from the GRanges
object
returned by the rowRanges
accessor.
strand(x)
:
Extract the strand
from the GRanges
object
returned by the rowRanges
accessor.
header(x)
, header(x)<- value
:
Get or set the VCF header information. Replacement value
must be a VCFHeader
object. To modify individual elements
use info<-
, geno<-
or meta<-
on a
‘VCFHeader’ object. See ?VCFHeader
man page for
details.
info(header(x))
geno(header(x))
meta(header(x))
samples(header(x))
vcfFields(x)
Returns a CharacterList
of all available VCF
fields, with names of fixed
, info
, geno
and
samples
indicating the four categories. Each element is a
character() vector of available VCF field names within each category.
In the following code x
is a VCF object, and ... is a list
of VCF objects.
x[i, j]
, x[i, j] <- value
: Gets or sets rows and columns.
i
and j
can be integer or logical vectors. value
is a
replacement VCF
object.
subset(x, subset, select, ...)
: Restricts x
by
evaluating the subset
argument in the scope of
rowData(x)
and info(x)
, and select
in the
context of colData(x)
. The subset
argument restricts
by rows, while the select
argument restricts by column. The
...
are passed to the underlying subset()
calls.
cbind(...)
, rbind(...)
: cbind
combines objects with
identical ranges (rowRanges
) but different samples (columns in
assays
). The colnames in colData
must match or an error is
thrown. Columns with duplicate names in fixed
, info
and
mcols(rowRanges(VCF))
must contain the same data.
rbind
combines objects with different ranges (rowRanges
) and
the same subjects (columns in assays
). Columns with duplicate names
in colData
must contain the same data. The ‘Samples’
columns in colData
(created by readVcf
) are renamed with a
numeric extension ordered as they were input to rbind
e.g.,
“Samples.1, Samples.2, ...” etc.
metadata
from all objects are combined into a
list
with no name checking.
In the following code snippets x
is a CollapsedVCF object.
expand(x, ..., row.names = FALSE)
:
Expand (unlist) the ALT column of a CollapsedVCF object to one row
per ALT value. Variables with Number='A' have one value per alternate
allele and are expanded accordingly. The 'AD' genotype field
(and any variables with 'Number' set to 'R')
is expanded into REF/ALT pairs. For all other fields, the rows
are replicated to match the elementNROWS of ALT.
The output is an ExpandedVCF with ALT as a DNAStringSet
or character
(structural variants). By default rownames
are NULL. When row.names=TRUE
the expanded output has
duplicated rownames corresponding to the original x
.
This function converts the 'GT' genotype codes in a VCF
object
to nucleotides. See also ?readGT
to read in only 'GT' data as
codes or nucleotides.
This function converts the output from the read.plink
function to a VCF
class. from
must be a list of length 3
with named elements "map", "fam" and "genotypes". seqSource
can
be a BSgenome
or an FaFile
used for reference sequence extraction.
Functions to identify variant type include isSNV,
isInsertion, isDeletion, isIndel,
isSubstitution and isTransition. See the ?isSNV
man page for details.
A list
or SimpleList
of matrix elements,
or a matrix
containing the genotype information from a
VCF file. If present, these data immediately follow the FORMAT
field in the VCF.
Each element of the list must have the same dimensions, and dimension
names (if present) must be consistent across elements and with the row
names of rowRanges
, colData
.
A DataFrame
of data from the INFO field of a VCF file.
The number of rows must match that in the rowRanges
object.
A DataFrame
of REF, ALT, QUAL and FILTER
fields from a VCF file. The number of rows must match that
of the rowRanges
object.
A GRanges
instance describing the ranges of
interest.
Row names, if present, become the row names of the VCF
. The length
of the GRanges
must equal the number of rows of the matrices in
geno
.
A DataFrame
describing the samples. Row names, if
present, become the column names of the VCF
.
A list
describing the header of the VCF file or
additional information for the overall experiment.
For cbind
and rbind
a list of VCF objects.
For all other methods ... are additional arguments passed to methods.
A logical(1)
indicating whether a
CollapsedVCF or ExpandedVCF should be created. The ALT in a
CollapsedVCF is a DNAStringSetList
while in a
ExpandedVCF it is a DNAStringSet
.
A logical(1)
indicating whether messages
about data coercion during construction should be printed.
Valerie Obenchain
GRanges,
DataFrame,
SimpleList,
RangedSummarizedExperiment,
readVcf
,
writeVcf
isSNV
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 | ## readVcf() parses data into a VCF object:
fl <- system.file("extdata", "structural.vcf", package="VariantAnnotation")
vcf <- readVcf(fl, genome="hg19")
## ----------------------------------------------------------------
## Accessors
## ----------------------------------------------------------------
## Variant locations are stored in the GRanges object returned by
## the rowRanges() accessor.
rowRanges(vcf)
## Suppress fixed fields:
rowRanges(vcf, fixed=FALSE)
## Individual fields can be extracted with ref(), alt(), qual(), filt() etc.
qual(vcf)
ref(vcf)
head(info(vcf))
## All available VCF field names can be contracted with vcfFields().
vcfFields(vcf)
## Extract genotype fields with geno(). Access specific fields with
## '$' or '[['.
geno(vcf)
identical(geno(vcf)$GQ, geno(vcf)[[2]])
## ----------------------------------------------------------------
## Renaming seqlevels and subsetting
## ----------------------------------------------------------------
## Overlap and matching operations require that the objects
## being compared have the same seqlevels (chromosome names).
## It is often the case that the seqlevesls in on of the objects
## needs to be modified to match the other. In this VCF, the
## seqlevels are numbers instead of preceded by "chr" or "ch".
seqlevels(vcf)
## Rename the seqlevels to start with 'chr'.
vcf2 <- renameSeqlevels(vcf, paste0("chr", seqlevels(vcf)))
seqlevels(vcf2)
## The VCF can also be subset by seqlevel using 'keepSeqlevels'
## or 'dropSeqlevels'. See ?keepSeqlevels for details.
vcf3 <- keepSeqlevels(vcf2, "chr2", pruning.mode="coarse")
seqlevels(vcf3)
## ----------------------------------------------------------------
## Header information
## ----------------------------------------------------------------
## Header data can be modified in the 'meta', 'info' and 'geno'
## slots of the VCFHeader object. See ?VCFHeader for details.
## Current 'info' fields.
rownames(info(header(vcf)))
## Add a new field to the header.
newInfo <- DataFrame(Number=1, Type="Integer",
Description="Number of Samples With Data",
row.names="NS")
info(header(vcf)) <- rbind(info(header(vcf)), newInfo)
rownames(info(header(vcf)))
## ----------------------------------------------------------------
## Collapsed and Expanded VCF
## ----------------------------------------------------------------
## readVCF() produces a CollapsedVCF object.
fl <- system.file("extdata", "ex2.vcf",
package="VariantAnnotation")
vcf <- readVcf(fl, genome="hg19")
vcf
## The ALT column is a DNAStringSetList to allow for more
## than one alternate allele per variant.
alt(vcf)
## For structural variants ALT is a CharacterList.
fl <- system.file("extdata", "structural.vcf",
package="VariantAnnotation")
vcf <- readVcf(fl, genome="hg19")
alt(vcf)
## ExpandedVCF is the 'flattened' counterpart of CollapsedVCF.
## The ALT and all variables with Number='A' in the header are
## expanded to one row per alternate allele.
vcfLong <- expand(vcf)
alt(vcfLong)
## Also see the ?VRanges class for an alternative form of
## 'flattened' VCF data.
## ----------------------------------------------------------------
## isSNV()
## ----------------------------------------------------------------
## isSNV() returns a subset VCF containing SNVs only.
vcf <- VCF(rowRanges = GRanges("chr1", IRanges(1:4*3, width=c(1, 2, 1, 1))))
alt(vcf) <- DNAStringSetList("A", c("TT"), c("G", "A"), c("TT", "C"))
ref(vcf) <- DNAStringSet(c("G", c("AA"), "T", "G"))
fixed(vcf)[c("REF", "ALT")]
## SNVs are present in rows 1 (single ALT value), 3 (both ALT values)
## and 4 (1 of the 2 ALT values).
vcf[isSNV(vcf, singleAltOnly=TRUE)]
vcf[isSNV(vcf, singleAltOnly=FALSE)] ## all 3 SNVs
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.