maxDists: Select a maximally diverse set of items given a distance...

Description Usage Arguments Value Note Author(s) See Also Examples

View source: R/taxTools.R

Description

Given a square matrix of pairwise distances, return indices of N objects with a maximal sum of pairwise distances.

Usage

1
2
3
maxDists(mat, idx = NA, N = 1,
         exclude = rep(FALSE, nrow(mat)),
         include.center = TRUE)

Arguments

mat

square distance matrix

idx

starting indices; if missing, starts with the object with the maximum median distance to all other objects.

N

total number of selections; length of idx is subtracted.

exclude

boolean vector indicating elements to exclude from the calculation.

include.center

includes the "most central" element (ie, the one with the smallest median of pairwise distances to all other elements) if TRUE

Value

A vector of indices corresponding to the margin of mat.

Note

Note that it is important to evaluate if the candidate sequences contain outliers (for example, mislabeled sequences), because these will assuredly be included in a maximally diverse set of elements!

Author(s)

Noah Hoffman

See Also

findOutliers

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
library(ape)
library(clstutils)
data(seqs)
data(seqdat)
efaecium <- seqdat$tax_name == 'Enterococcus faecium'
seqdat <- subset(seqdat, efaecium)
seqs <- seqs[efaecium,]
dmat <- ape::dist.dna(seqs, pairwise.deletion=TRUE, as.matrix=TRUE, model='raw')

## find a maximally diverse set without first identifying outliers
picked <- maxDists(dmat, N=10)
picked
prettyTree(nj(dmat), groups=ifelse(1:nrow(dmat) %in% picked,'picked','not picked'))

## restrict selected elements to non-outliers
outliers <- findOutliers(dmat, cutoff=0.015)
picked <- maxDists(dmat, N=10, exclude=outliers)
picked
prettyTree(nj(dmat), groups=ifelse(1:nrow(dmat) %in% picked,'picked','not picked'),
X = outliers)

clstutils documentation built on Nov. 8, 2020, 5:23 p.m.