knitr::opts_chunk$set(echo = TRUE)
This document is intended for R package developers who wish to utilize S4 classes from the ploidyverse in order to make their package interoperable with other ploidyverse packages. It assumes basic familiarity with the process of creating an R package.
ploidyverseVcf
The first question you should ask yourself is, will the ability to read and write VCFs be mandatory for using your package? Or, will users frequently want to import or export data in a different format, and not care about incorporating other ploidyverse software into their workflow?
ploidyverseVcf
depends on the Bioconductor package
VariantAnnotation
.
Like many Bioconductor packages, VariantAnnotation
can be unavoidably
cumbersome to install. We on the ploidyverse core team believe that it is
worthwhile for its flexibility in importing, storing, and exporting genomic
variants and metadata, but we want to minimize how much we force it onto people
who don't want it.
ploidyverseVcf
In this case, you will indicate dependency on ploidyverseVcf
in the
standard way that you would for any R package. In your DESCRIPTION
file,
list ploidyverseVcf
on the Imports
line.
Imports: ploidyverseVcf
In the NAMESPACE
file, you should include the lines:
importFrom
line for any functions you are using from ploidyverseVcf
importFrom
lines for any functions you are using from VariantAnnotation
or other packagesFor any ploidyverse or Bioconductor functions that you use, I recommend looking at the help page to determine exactly which package it originated from.
Remember that any package with an importFrom
line will also need a mention
in the Imports
line of your DESCRIPTION
file.
ploidyverseVcf
If you don't want to force someone to install ploidyverseVcf
(and its
dependencies) in order to use your package, you should instead list
ploidyverseVcf
and methods
on the Suggests
line of your DESCRIPTION
file. Also list any Bioconductor packages whose functions you want to use
in Suggests
.
Suggests: ploidyverseVcf, methods
Then within your function definitions, use the ::
operator to indicate
the package that a function comes from. For example, to use the seqinfo
function, type GenomeInfoDb::seqinfo
.
# a trivial example GetSeqInfo <- function(x){ GenomeInfoDb::seqinfo(x) }
If you want your function to behave differently depending on whether or
not ploidyverseVcf
is installed, you can use the function
requireNamespace
.
if(requireNamespace("ploidyverseVcf", quietly = TRUE)){ cat("Let's make a VCF!") } else { cat("Let's make something else.") }
ploidyverseVcf
The ploidyverseVcf
package includes a set of utility functions
implemented in Rcpp
to assist with calling, importing, and exporting
multiallelic polyploid genotypes. See ?nGen
.
To use any of those functions from within R, you can follow option A or B above.
To use any of those functions from within C++:
LinkingTo: ploidyverseVcf, Rcpp
to your DESCRIPTION
file.#include <ploidyverseVcf.h>
and
using namespace ploidyverseVCF;
to your C++ file(s) that use functions from
ploidyverseVcf
.To be able to test your C++ files using sourceCpp
, be sure to have them in
the src
directory of a package with the DESCRIPTION
file modified as
described above. If you put the C++ file in a different directory, sourceCpp
will not be able to find ploidyverseVcf.h
.
VCF files can be imported using VariantAnnotation
's readVcf
function.
The tutorial for your package might direct users to look at the readVcf
help
page, or your package might include a custom import function that uses
readVcf
internally. Note that the param
argument is very helpful for
adjusting which samples, genomic regions, and fields (GT, AD, etc.) get
imported, saving valuable memory.
If you want to specify genomic regions to import, or if you want to loop
through a VCF in manageable chunks, you should compress the file with bgzip
,
index it with indexTabix
, and create a connection with TabixFile
.
Within R, the data from the VCF will be in an S4 object of class
collapsedVCF
or expandedVCF
. These two classes differ in terms of how they
handle sites with multiple alternative alleles. Both are subclasses of the
VCF
virual class. An object of either of these classes can be created with
the VCF
constructor function. If your software will import data in a format
other than VCF, then export to VCF, you will need the constructor function.
VariantAnnotation
Given a VCF
object called myvcf
:
geno(myvcf)$AD
returns a two dimensional list (marker x sample) of integer
vectors of allelic read depth. The first allele in each vector is the
reference allele.geno(myvcf)$GP
returns genotype posterior probabilities in a similar
format, in numeric vectors.samples(header(myvcf))
returns a character vector of sample names, taken
from the column headers in the VCF file.rowRanges(myvcf)
returns the location of each marker, and other information
such as reference and alternative alleles, in GRanges
format.VCF
objectThe function array3D_to_matrixList
can convert a 3D array, such as those
stored in the genotypeLikelihood
and posteriorProb
slots of a RADdata
object in polyRAD
, into the matrix-list format required in
VariantAnnotation
. So, if myvcf
is a VCF
object and myrad
is a
RADdata
object, one could do something like:
geno(myvcf)$GP <- array3D_to_matrixList(myrad$posteriorProb[[1]][-OneAllelePerMarker(myrad)])
A matrix of posterior mean genotypes could to go into the GN
slot.
Some feedback from other package developers might be helpful here to make it easy for them to get their genotype calls into the VCF.
Note: ploidyverseVcf
will include custom accessor functions for all of these.
sampleinfo
is used for adding metadata about samples.software
is used for adding metadata about the software used for genotype
calling.VCF
object meets ploidyverse specificationsTo check whether a VCF
object meets ploidyverse specifications, run
myvcf <- markValidity(myvcf)
There are three levels of validity that are checked:
If you want a TRUE
or FALSE
value to tell you whether the VCF
object
meets these specifications (for example, for error checking before running a
function), after running markValidity
you can run:
validPloidyverseVCF_Precall(myvcf) validPloidyverseVCF_Postcall(myvcf) validPloidyverseVCF_Archival(myvcf)
Before exporting a VCF, the markValidity
function should be run on the VCF
object, as it will add header lines confirming whether or not the file meets
ploidyverse specifications. Any metadata that was added to the object will
also be exported to the file.
The object is then passed to the writeVcf
function to be written to a file.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.