readDArTag: Import Data from DArT Sequencing

View source: R/data_import.R

readDArTagR Documentation

Import Data from DArT Sequencing

Description

Diversity Array Technologies (DArT) provides a tag-based genotyping-by-sequencing service. Together with Breeding Insight, a format was developed indicting haplotype sequence and read depth, and that format is imported by this function to make a RADdata object. The target SNP and all off-target SNPs within the amplicon are imported as haplotypes. Because the file format does not indicate strandedness of the tag, BLAST results are used so that sequence and position are accurately stored in the RADdata object. See the “extdata” folder of the polyRAD installation for example files.

Usage

readDArTag(file, botloci = NULL, blastfile = NULL, excludeHaps = NULL,
           includeHaps = NULL, n.header.rows = 0, sample.name.row = 1,
           trim.sample.names = "_[^_]+_[ABCDEFGH][[:digit:]][012]?$",
           sep.counts = ",", sep.blast = "\t", possiblePloidies = list(2),
           taxaPloidy = 2L, contamRate = 0.001)

Arguments

file

The file name of a spreadsheet from DArT indicating haplotype sequence and read depth.

botloci

A character vector indicating the names of loci for which the sequence is on the bottom strand with respect to the reference genome. All other loci are assumed to be on the top strand. Only one of blastfile and botloci should be provided.

blastfile

File name for BLAST results for haplotypes. The file should be in tabular format with qseqid, sseqid, sstart, send, and pident columns, indicated with column headers. Only one of blastfile and botloci should be provided.

excludeHaps

Optional. Character vector with names of haplotypes (from the “AlleleID” column) that should not be imported. Should not be used if includeHaps is provided.

includeHaps

Optional. Character vector with names of haplotypes (from the “AlleleID” column) that should be imported. Should not be used if excludeHaps is provided.

n.header.rows

Integer. The number of header rows in file, not including the full row of column headers.

sample.name.row

Integer. The row within file from which sample names should be derived.

trim.sample.names

A regular expression indicating text to trim off of sample names. Use "" if no trimming should be performed.

sep.counts

The field separator character for file. The default assumes CSV.

sep.blast

The field separator character for the BLAST results. The default assumes tab-delimited.

possiblePloidies

A list indicating possible inheritance modes. See RADdata.

taxaPloidy

A single integer, or an integer vector with one value per taxon, indicating ploidy. See RADdata.

contamRate

Expected sample cross-contamination rate. See RADdata.

Details

The “CloneID” column is used for locus names, and is assumed to contain the chromosome (or scaffold) name and position, separated by an underscore. The position is assumed to refer to the target SNP, which is identified by comparing the “Ref_001” and “Alt_002” sequences. The position is then converted to refer to the beginning of the tag (which may have been reverse complemented depending on BLAST results), since additional SNPs may be present. This facilitates accurate export to VCF using RADdata2VCF.

Column names for the BLAST file can be “Query”, “Subject”, “S_start”, “S_end”, and “%Identity”, for compatibility with Breeding Insight formats.

Value

A RADdata object ready for QC and genotype calling. Assuming the “Ref_001” and “Alt_002” alleles were not excluded, the locTable slot will include columns for chromosome, position, strand, and reference sequence.

Author(s)

Lindsay V. Clark

References

https://www.diversityarrays.com/

https://breedinginsight.org/

See Also

reverseComplement

readTagDigger, VCF2RADdata, readStacks, readTASSELGBSv2, readHMC

RADdata2VCF

Examples

## Older Excellence in Breeding version
# Example files installed with polyRAD
dartfile <- system.file("extdata", "DArTag_example.csv", package = "polyRAD")
blastfile <- system.file("extdata", "DArTag_BLAST_example.txt",
                         package = "polyRAD")

# One haplotype doesn't seem to have correct alignment (see BLAST results)
exclude_hap <- c("Chr1_30668472|RefMatch_004")

# Import data
mydata <- readDArTag(dartfile, blastfile = blastfile,
                      excludeHaps = exclude_hap,
                      possiblePloidies = list(4),
                      n.header.rows = 7, sample.name.row = 7)
                      
## Newer Excellence in Breeding version (2022)
# Example files installed with polyRAD
dartfile <- system.file("extdata", "DArTag_example2.csv", package = "polyRAD")
blastfile <- system.file("extdata", "DArTag_BLAST_example2.txt",
                         package = "polyRAD")

# One haplotype doesn't seem to have correct alignment (see BLAST results)
exclude_hap <- c("Chr1_30668472|RefMatch_0004")

# Import data
mydata <- readDArTag(dartfile, blastfile = blastfile,
                      excludeHaps = exclude_hap,
                      possiblePloidies = list(4),
                      n.header.rows = 0, sample.name.row = 1)

lvclark/polyRAD documentation built on Jan. 15, 2024, 4:19 a.m.