RADdata2VCF: Export RADdata Genotypes to VCF

View source: R/data_export.R

RADdata2VCFR Documentation

Export RADdata Genotypes to VCF

Description

Converts genotype calls from polyRAD into VCF format. The user may send the results directly to a file, or to a CollapsedVCF for further manipulation.

Usage

RADdata2VCF(object, file = NULL, asSNPs = TRUE, hindhe = TRUE,
            sampleinfo = data.frame(row.names = GetTaxa(object)),
            contigs = data.frame(row.names = unique(object$locTable$Chr)))

Arguments

object

A RADdata object in which genotype calling has been performed. It is also important for the data to have been imported in a way that preserves variant positions (i.e. readProcessIsoloci, readTASSELGBSv2, VCF2RADdata using the refgenome argument).

file

An optional character string or connection indicating where to write the file. Append mode may be used with connections if multiple RADdata objects need to be written to one VCF.

asSNPs

Boolean indicating whether to convert haplotypes to individual SNPs and indels.

hindhe

Boolean indicating whether to export a mean value of Hind/He (see HindHe) for every sample and locus.

sampleinfo

A data frame with optional columns indicating any sample metadata to export to "SAMPLE" header lines.

contigs

A data frame with optional columns providing information about contigs to export to "contig" header lines.

Details

Currently, the FORMAT fields exported are GT (genotype), AD (allelic read depth), and DP (read depth). Genotype posterior probabilities are not exported due to the mathematical intractability of converting pseudo-biallelic probabilities to multiallelic probabilities.

Genotypes exported to the GT field are obtained internally using GetProbableGenotypes.

INFO fields exported include the standard fields NS (number of samples with more than zero reads) and DP (total depth across samples) as well as the custom fields LU (index of the marker in the original RADdata object) and HH (Hind/He statistic for the marker).

This function requires the BioConductor package VariantAnnotation. See https://bioconductor.org/packages/release/bioc/html/VariantAnnotation.html for installation instructions.

Value

A CollapsedVCF object.

Author(s)

Lindsay V. Clark

References

https://samtools.github.io/hts-specs/VCFv4.3.pdf

See Also

VCF2RADdata, ExportGAPIT

Examples

# Set up example dataset for export.
# You DO NOT need to adjust attr or locTable in your own dataset.
data(exampleRAD)
attr(exampleRAD$alleleNucleotides, "Variable_sites_only") <- FALSE
exampleRAD$locTable$Ref <- 
  exampleRAD$alleleNucleotides[match(1:nLoci(exampleRAD), exampleRAD$alleles2loc)]
exampleRAD <- IterateHWE(exampleRAD)

# An optional table of sample data
sampleinfo <- data.frame(row.names = GetTaxa(exampleRAD),
                         Population = rep(c("North", "South"), each = 50))

# Add contig information (fill in with actual data rather than random)
mycontigs <- data.frame(row.names = c("1", "4", "6", "9"), length = sample(1e8, 4),
                        URL = rep("ftp://mygenome.com/mygenome.fa", 4))

# Set up a file destination for this example
# (It is not necessary to use tempfile with your own data)
outfile <- tempfile(fileext = ".vcf")


# Export VCF
testvcf <- RADdata2VCF(exampleRAD, file = outfile, sampleinfo = sampleinfo,
                       contigs = mycontigs)


lvclark/polyRAD documentation built on Jan. 15, 2024, 4:19 a.m.