RADdata: RADdata object constructor

View source: R/classes_methods.R

RADdataR Documentation

RADdata object constructor

Description

RADdata is used internally to generate objects of the S3 class “RADdata” by polyRAD functions for importing read depth data. It is also available at the user level for cases where the data for import are not already in a format supported by polyRAD.

Usage

RADdata(alleleDepth, alleles2loc, locTable, possiblePloidies, contamRate,
        alleleNucleotides, taxaPloidy)
        
## S3 method for class 'RADdata'
plot(x, ...)

Arguments

alleleDepth

An integer matrix, with taxa in rows and alleles in columns. Taxa names should be included as row names. Each value indicates the number of reads for a given allele in a given taxon. There should be no NA values; use zero to indicate no reads.

alleles2loc

An integer vector with one value for each column of alleleDepth. The number indicates the identity of the locus to which the allele belongs. A locus can have any number of alleles assigned to it (including zero).

locTable

A data frame, where locus names are row names. There must be at least as many rows as the highest value of alleles2loc; each number in alleles2loc corresponds to a row index in locTable. No columns are required, although if provided a column named “Chr” will be used for indicating chromosome identities, a column named “Pos” will be used for indicating physical position, and a column named “Ref” will be used to indicate the reference sequence.

possiblePloidies

A list, where each item in the list is an integer vector (or a numeric vector that can be converted to integer). Each vector indicates an inheritance pattern that markers in the dataset might obey. 2 indicates diploid, 4 indicates autotetraploid, c(2, 2) indicates allotetraploid, etc.

contamRate

A number ranging from zero to one (although in practice probably less than 0.01) indicating the expected sample cross-contamination rate.

alleleNucleotides

A character vector with one value for each column of alleleDepth, indicating the DNA sequence for that allele. Typically only the sequence at variable sites is provided, although intervening non-variable sequence can also be provided.

taxaPloidy

An integer vector indicating ploidies of taxa. If a single value is provided, it will be assumed that all taxa are the same ploidy. Otherwise, one value must be provided for each taxon. If unnamed, it is assumed that taxa are in the same order as the rows of alleleDepth. If named, names must match the row names of alleleDepth but do not need to be in the same order. This value is used as a multiplier with possiblePloidies; see Details.

x

A “RADdata” object.

...

Additional arguments to pass to plot, for example col or pch.

Details

For a single locus, ideally the string provided in locTable$Ref and all strings in alleleNucleotides are the same length, so that SNPs and indels may be matched by position. The character “-” indicates a deletion with respect to the reference, and can be used within alleleNucleotides. The character “.” is a placeholder where other alleles have an insertion with respect to the reference, and may be used in locTable$Ref and alleleNucleotides. Note that it is possible for the sequence in locTable$Ref to be absent from alleleNucleotides if the reference haplotype is absent from the dataset, as may occur if the reference genome is that of a related species and not the actual study species. For the alleleNucleotides vector, the attribute "Variable_sites_only" indicates whether non-variable sequence in between variants is included; this needs to be FALSE for other functions to determine the position of each variant within the set of tags.

Inheritance mode is determined by multiplying the values in possiblePloidies by the values in taxaPloidy and dividing by two. For example, if you wanted to assume autotetraploid inheritance across the entire dataset, you could set possiblePloidies = list(4) and taxaPloidy = 2, or alternatively possiblePloidies = list(2) and taxaPloidy = 4. To indicate a mix of diploid and allotetraploid inheritance across loci, set possiblePloidies = list(2, c(2, 2)) and taxaPloidy = 2. If taxa themselves vary in ploidy, provide one value of taxaPloidy for each taxon. All inheritance modes listed in possiblePloidies apply equally to all taxa, even when ploidy varies by taxon.

Value

An object of the S3 class “RADdata”. The following slots are available using the $ operator:

alleleDepth

Identical to the argument provided to the function.

alleles2loc

Identical to the argument provided to the function.

locTable

Identical to the argument provided to the function.

possiblePloidies

The possiblePloidies argument, converted to integer.

locDepth

A matrix with taxa in rows and loci in columns, with read depth summed across all alleles for each locus. Column names are locus numbers rather than locus names. See GetLocDepth for retrieving the same matrix but with locus names as column names.

depthSamplingPermutations

A numeric matrix with taxa in rows and alleles in columns. It is calculated as log(locDepth choose alleleDepth). This is used as a coefficient for likelihood estimations done by other polyRAD functions (i.e. AddGenotypeLikelihood).

depthRatio

A numeric matrix with taxa in rows and alleles in columns. Calculated as alleleDepth / locDepth. Used by other polyRAD functions for rough estimation of genotypes and allele frequency.

antiAlleleDepth

An integer matrix with taxa in rows and alleles in columns. For each allele, the number of reads from the locus that do NOT belong to that allele. Calculated as locDepth - alleleDepth. Used for likelihood estimations by other polyRAD functions.

alleleNucleotides

Identical to the argument provided to the function.

taxaPloidy

A named integer vector with one value per taxon, indicating the ploidy of taxa.

The object additionally has several attributes (see attr):

taxa

A character vector listing all taxa names, in the same order as the rows of alleleDepth.

nTaxa

An integer indicating the number of taxa.

nLoc

An integer indicating the number of loci in locTable.

contamRate

Identical to the argument provided to the function.

The plot method performs a principal components analysis with AddPCA if not already done, then plots the first two axes. Points represent individuals (taxa). If mapping population parents have been noted in the object (see SetDonorParent), they are indicated in the plot.

Author(s)

Lindsay V. Clark

See Also

Data import functions that internally call RADdata:

readHMC, readTagDigger, VCF2RADdata, readStacks, readTASSELGBSv2, readProcessSamMulti, readProcessIsoloci

Examples

# create the dataset
mydepth <- matrix(sample(100, 16), nrow = 4, ncol = 4,
                  dimnames = list(paste("taxon", 1:4, sep = ""),
                  paste("loc", c(1,1,2,2), "_", c(0,1,0,1), sep = "")))
mydata <- RADdata(mydepth, c(1L,1L,2L,2L), 
                  data.frame(row.names = c("loc1", "loc2"), Chr = c(1,1),
                             Pos = c(2000456, 5479880)),
                  list(2, c(2,2)), 0.001, c("A", "G", "G", "T"), 6)

# inspect the dataset
mydata
mydata$alleleDepth
mydata$locDepth
mydata$depthRatio
mydata$taxaPloidy

# the S3 class structure is flexible; other data can be added
mydata$GPS <- data.frame(row.names = attr(mydata, "taxa"),
                         Lat = c(43.12, 43.40, 43.05, 43.27),
                         Long = -c(70.85, 70.77, 70.91, 70.95))
mydata$GPS

# If you have NA in your alleleDepth matrix to indicate zero reads,
# perform the following before running the RADdata constructor:
mydepth[is.na(mydepth)] <- 0L

# plotting a RADdata object
plot(mydata)

lvclark/polyRAD documentation built on Jan. 15, 2024, 4:19 a.m.