KnnDistIDingBal: K-Nearest Neighbour multiple specimen identification with...
In ArdernHB/KnnDist: Kernal Nearest Neighbour classification with distance inputs

Description Usage Arguments Value Author(s)

View source: R/SpecimenIDing_functions.R

This function is for a balanced KNN identification design applied to multiple unknown specimens. Groups are resampled iteratively to equal sample size; note this is done by downsampleing the groups to a sample size set by the user or if left to default to the sample size of the smallest groups. Bootstrap resampling is not provided as an option because this can be problematic for nearest neighbour approaches because of duplication of neighbours.

KnnDistIDingBal(
  DistMat,
  GroupMembership,
  UnknownIdentifier = "Unknown",
  SpecimenIDs,
  K,
  EqualIter = 100,
  TieBreaker,
  SampleSize = NA
)

`DistMat`	is a square matrix of pairwise distances among all reference specimens.
`GroupMembership`	a character or factor vector in the same order as the distance data to denote group membership.
`UnknownIdentifier`	the name used in the `GroupMembership` argument to denote specimens to be identified. Only one name can be supplied as Unknown; default is set to 'Unknown'.
`SpecimenIDs`	should be a list of the specimens unique identifiers for all specimens in the `DistMat` object and ensuring they are in the same order as the `DistMat` object.
`K`	is the number of nearest neighbours that the method will use for assigning group classification.
`EqualIter`	sets the number of iterations resampling to equal sample size will be carried out.
`TieBreaker`	is the method used to break ties if there is no majority resulting from K. Three methods are available('Random', 'Remove' and 'Report'): Random randomly returns one of tied classifications; Remove returns 'UnIDed' for the classification; Report returns a the multiple classifications as a single character string with tied classifications separated by '_'. NOTE: for correct cross-validation proceedures the results of both Report will be considered an incorrect identification even if one of the multiple reported classifications is correct.
`SampleSize`	is the sample number that groups will be subsampled to if `Equal` is set to TRUE. The default is set to NA and will therefore use the smallest sample size of the groups provided.