meandistance.matrix: Mean Pairwise Distance Matrix
In polysat: Tools for Polyploid Microsatellite Analysis

meandistance.matrix

R Documentation

Mean Pairwise Distance Matrix

Description

Given a genambig object, meandistance.matrix produces a symmetrical matrix of pairwise distances between samples, averaged across all loci. An array of all distances prior to averaging may also be produced.

Usage

meandistance.matrix(object, samples = Samples(object),
                    loci = Loci(object), all.distances=FALSE,
                    distmetric = Bruvo.distance, progress = TRUE,
                    ...)
meandistance.matrix2(object, samples = Samples(object),
                     loci = Loci(object),
                     freq = simpleFreq(object, samples, loci), self = 0,
                     all.distances = FALSE, distmetric = Bruvo.distance,
                     progress = TRUE, ...)

Arguments

`object`	A `genambig` object containing the genotypes to be analyzed. If `distmetric = Bruvo.distance`, the `Usatnts` slot should be filled in. For `meandistance.matrix2`, `Ploidies` and `PopInfo` are also required.
`samples`	A character vector of samples to be analyzed. These should be all or a subset of the sample names used in `object`.
`loci`	A character vector of loci to be analyzed. These should be all or a subset of the loci names used in `object`.
`freq`	A data frame of allele frequencies such as that produced by `simpleFreq` or `deSilvaFreq`.
`self`	A number ranging from 0 to 1, indicating the rate of selfing.
`all.distances`	If `FALSE`, only the mean distance matrix will be returned. If `TRUE`, a list will be returned containing an array of all distances by locus and sample as well as the mean distance matrix.
`distmetric`	The function to be used to calculate distances between genotypes. `Bruvo.distance`, `Lynch.distance`, or a distance function written by the user.
`progress`	If `TRUE`, loci and samples will be printed to the console as distances are calculated, so that the user can monitor the progress of the computation.
`...`	Additional arguments (such as `maxl`, `add`, and `loss`) to pass to `distmetric`.

Details

Each distance for the three-dimensional array is calculated only once, to save computation time. Since the array (and resulting mean matrix) is symmetrical, the distance is written to two positions in the array at once.

meandistance.matrix uses ambiguous genotypes exactly as they are, whereas meandistance.matrix2 uses genotypeProbs to calculate all possible unambiguous genotypes and their probabilities under random mating or partial selfing. The distance between each possible pair of unambiguous genotypes for the two samples is calculated with distmetric and weighted by the product of the probabilities of the two gentoypes. As you might expect, meandistance.matrix2 takes longer to process a given "genambig" object than meandistance.matrix does. Additionally, the distance between two identical ambiguous genotypes will be zero when calculated with meandistance.matrix, and greater than zero when calculated with meandistance.matrix2, due to potential differences in copy number of the alleles.

When Bruvo.distance is used, meandistance.matrix2 exaggerates distances between individuals of different ploidy as compared to meandistance.matrix. The use of Bruvo2.distance with meandistance.matrix2 allows individuals with different ploidies to have similar inter-individual distances to those between individuals of the same ploidy. In general, it will be desirable to use Bruvo.distance with meandistance.matrix for complex datasets with high ploidy levels, or Bruvo.distance2 with meandistance.matrix2 for hexaploid or lower datasets (based on how long it takes my personal computer to perform these calculations) where changes in ploidy are due to genome doubling or genome loss. If all individuals have the same ploidy, Bruvo.distance and Bruvo2.distance will give identical results regardless of whether meandistance.matrix or meandistance.matrix2 is used.

meandistance.matrix2 does not allow a genotype to have more alleles than the ploidy of the individual (as listed in the Ploidies slot). Additionally, if self is greater than zero, each population may only have one ploidy at each locus.

Value

A symmetrical matrix containing pairwise distances between all samples, averaged across all loci. Row and column names of the matrix will be the sample names provided in the samples argument. If all.distances=TRUE, a list will be produced containing the above matrix as well as a three-dimensional array containing all distances by locus and sample. The array is the first item in the list, and the mean matrix is the second.

Author(s)

Lindsay V. Clark

Examples

# create a list of genotype data
mygendata <- new("genambig", samples = c("ind1","ind2","ind3","ind4"),
                 loci = c("locus1","locus2","locus3","locus4"))
Genotypes(mygendata) <-
  array(list(c(124,128,138),c(122,130,140,142),c(122,132,136),c(122,134,140),
             c(203,212,218),c(197,206,221),c(215),c(200,218),
             c(140,144,148,150),c(-9),c(146,150),c(152,154,158),
             c(233,236,280),c(-9),c(-9),c(-9)))
Usatnts(mygendata) <- c(2,3,2,1)

# make index vectors of data to use
myloci <- c("locus1","locus2","locus3")
mysamples <- c("ind1","ind2","ind4")

# calculate array and matrix
mymat <- meandistance.matrix(mygendata, mysamples, myloci,
                             all.distances=TRUE)
# view the results
mymat[[1]]["locus1",,]
mymat[[1]]["locus2",,]
mymat[[1]]["locus3",,]
mymat[[2]]

# add addtional info needed for meandistance.matrix2
mygendata <- reformatPloidies(mygendata, output="one")
Ploidies(mygendata) <- 4
PopInfo(mygendata) <- c(1,1,1,1)

# calculate distances taking allele freqs into account
mymat2 <- meandistance.matrix2(mygendata, mysamples, myloci)
mymat2
# now do the same under selfing
mymat3 <- meandistance.matrix2(mygendata, mysamples, myloci, self=0.3)
mymat3

polysat documentation built on Aug. 23, 2022, 5:07 p.m.