KnnDistIDingBal: K-Nearest Neighbour multiple specimen identification with...

Description Usage Arguments Value Author(s)

View source: R/SpecimenIDing_functions.R

Description

This function is for a balanced KNN identification design applied to multiple unknown specimens. Groups are resampled iteratively to equal sample size; note this is done by downsampleing the groups to a sample size set by the user or if left to default to the sample size of the smallest groups. Bootstrap resampling is not provided as an option because this can be problematic for nearest neighbour approaches because of duplication of neighbours.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
KnnDistIDingBal(
  DistMat,
  GroupMembership,
  UnknownIdentifier = "Unknown",
  SpecimenIDs,
  K,
  EqualIter = 100,
  TieBreaker,
  SampleSize = NA
)

Arguments

DistMat

is a square matrix of pairwise distances among all reference specimens.

GroupMembership

a character or factor vector in the same order as the distance data to denote group membership.

UnknownIdentifier

the name used in the GroupMembership argument to denote specimens to be identified. Only one name can be supplied as Unknown; default is set to 'Unknown'.

SpecimenIDs

should be a list of the specimens unique identifiers for all specimens in the DistMat object and ensuring they are in the same order as the DistMat object.

K

is the number of nearest neighbours that the method will use for assigning group classification.

EqualIter

sets the number of iterations resampling to equal sample size will be carried out.

TieBreaker

is the method used to break ties if there is no majority resulting from K. Three methods are available('Random', 'Remove' and 'Report'): Random randomly returns one of tied classifications; Remove returns 'UnIDed' for the classification; Report returns a the multiple classifications as a single character string with tied classifications separated by '_'. NOTE: for correct cross-validation proceedures the results of both Report will be considered an incorrect identification even if one of the multiple reported classifications is correct.

SampleSize

is the sample number that groups will be subsampled to if Equal is set to TRUE. The default is set to NA and will therefore use the smallest sample size of the groups provided.

Value

Returns a matrix of the leave-one-out classifications for all the specimens along with their known classificaiton.

Author(s)

Ardern Hulme-Beaman


ArdernHB/KnnDist documentation built on Feb. 5, 2021, 5:09 a.m.