Lindsay V. Clark, University of Illinois, Urbana-Champaign 07 April 2019
This document is intended for R package developers who wish to utilize S4 classes from the ploidyverse in order to make their package interoperable with other ploidyverse packages. It assumes basic familiarity with the process of creating an R package.
ploidyverseVcfThe first question you should ask yourself is, will the ability to read and write VCFs be mandatory for using your package? Or, will users frequently want to import or export data in a different format, and not care about incorporating other ploidyverse software into their workflow?
ploidyverseVcf depends on the Bioconductor package
VariantAnnotation.
Like many Bioconductor packages, VariantAnnotation can be unavoidably
cumbersome to install. We on the ploidyverse core team believe that it
is worthwhile for its flexibility in importing, storing, and exporting
genomic variants and metadata, but we want to minimize how much we force
it onto people who don’t want it.
ploidyverseVcfIn this case, you will indicate dependency on ploidyverseVcf in the
standard way that you would for any R package. In your DESCRIPTION
file, list ploidyverseVcf on the Imports line.
Imports: ploidyverseVcf
In the NAMESPACE file, you should include the lines:
importFrom line for any functions you are using from
ploidyverseVcfimportFrom lines for any functions you are using from
VariantAnnotation or other packagesFor any ploidyverse or Bioconductor functions that you use, I recommend looking at the help page to determine exactly which package it originated from.
Remember that any package with an importFrom line will also need a
mention in the Imports line of your DESCRIPTION file.
ploidyverseVcfIf you don’t want to force someone to install ploidyverseVcf (and its
dependencies) in order to use your package, you should instead list
ploidyverseVcf and methods on the Suggests line of your
DESCRIPTION file. Also list any Bioconductor packages whose functions
you want to use in Suggests.
Suggests: ploidyverseVcf, methods
Then within your function definitions, use the :: operator to indicate
the package that a function comes from. For example, to use the
seqinfo function, type GenomeInfoDb::seqinfo.
# a trivial example
GetSeqInfo <- function(x){
GenomeInfoDb::seqinfo(x)
}
If you want your function to behave differently depending on whether or
not ploidyverseVcf is installed, you can use the function
requireNamespace.
if(requireNamespace("ploidyverseVcf", quietly = TRUE)){
cat("Let's make a VCF!")
} else {
cat("Let's make something else.")
}
ploidyverseVcfThe ploidyverseVcf package includes a set of utility functions
implemented in Rcpp to assist with calling, importing, and exporting
multiallelic polyploid genotypes. See ?nGen.
To use any of those functions from within R, you can follow option A or B above.
To use any of those functions from within C++:
LinkingTo: ploidyverseVcf, Rcpp to your DESCRIPTION
file.#include <ploidyverseVcf.h> and using namespace
ploidyverseVCF; to your C++ file(s) that use functions from
ploidyverseVcf.To be able to test your C++ files using sourceCpp, be sure to have
them in the src directory of a package with the DESCRIPTION file
modified as described above. If you put the C++ file in a different
directory, sourceCpp will not be able to find ploidyverseVcf.h.
VCF files can be imported using VariantAnnotation’s readVcf
function. The tutorial for your package might direct users to look at
the readVcf help page, or your package might include a custom import
function that uses readVcf internally. Note that the param argument
is very helpful for adjusting which samples, genomic regions, and fields
(GT, AD, etc.) get imported, saving valuable memory.
If you want to specify genomic regions to import, or if you want to loop
through a VCF in manageable chunks, you should compress the file with
bgzip, index it with indexTabix, and create a connection with
TabixFile.
Within R, the data from the VCF will be in an S4 object of class
collapsedVCF or expandedVCF. These two classes differ in terms of
how they handle sites with multiple alternative alleles. Both are
subclasses of the VCF virual class. An object of either of these
classes can be created with the VCF constructor function. If your
software will import data in a format other than VCF, then export to
VCF, you will need the constructor function.
VariantAnnotationGiven a VCF object called myvcf:
geno(myvcf)$AD returns a two dimensional list (marker x sample) of
integer vectors of allelic read depth. The first allele in each
vector is the reference allele.geno(myvcf)$GP returns genotype posterior probabilities in a
similar format, in numeric vectors.samples(header(myvcf)) returns a character vector of sample names,
taken from the column headers in the VCF file.rowRanges(myvcf) returns the location of each marker, and other
information such as reference and alternative alleles, in GRanges
format.VCF objectThe function array3D_to_matrixList can convert a 3D array, such as
those stored in the genotypeLikelihood and posteriorProb slots of a
RADdata object in polyRAD, into the matrix-list format required in
VariantAnnotation. So, if myvcf is a VCF object and myrad is a
RADdata object, one could do something like:
geno(myvcf)$GP <-
array3D_to_matrixList(myrad$posteriorProb[[1]][-OneAllelePerMarker(myrad)])
A matrix of posterior mean genotypes could to go into the GN slot.
Some feedback from other package developers might be helpful here to make it easy for them to get their genotype calls into the VCF.
Note: ploidyverseVcf will include custom accessor functions for all of
these.
sampleinfo is used for adding metadata about samples.software is used for adding metadata about the software used for
genotype calling.VCF object meets ploidyverse specificationsTo check whether a VCF object meets ploidyverse specifications, run
myvcf <- markValidity(myvcf)
There are three levels of validity that are checked:
If you want a TRUE or FALSE value to tell you whether the VCF
object meets these specifications (for example, for error checking
before running a function), after running markValidity you can run:
validPloidyverseVCF_Precall(myvcf)
validPloidyverseVCF_Postcall(myvcf)
validPloidyverseVCF_Archival(myvcf)
Before exporting a VCF, the markValidity function should be run on the
VCF object, as it will add header lines confirming whether or not the
file meets ploidyverse specifications. Any metadata that was added to
the object will also be exported to the file.
The object is then passed to the writeVcf function to be written to a
file.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.