KnnDistCVStepwisePar: Stepwise K-Nearest Neighbour correct cross-validation with...
In ArdernHB/KnnDist: Kernal Nearest Neighbour classification with distance inputs

Description Usage Arguments Details Value Author(s)

View source: R/ParallelKNNCrossValidation_functions.R

This function takes a square matrix of distances among specimens of known group membership and returns the results of a leave-one-out correct cross validation identification exercise for each incremental increase in k. The results of the analyses can be plotted to visualise the change in correct identification given changes in k.

KnnDistCVStepwisePar(
  DistMat,
  GroupMembership,
  Kmax,
  EqualIter = 100,
  SampleSize = NA,
  TieBreaker = c("Random", "Remove", "Report"),
  PlotResults = TRUE
)

`DistMat`	is a square matrix of pairwise distances among all reference specimens.
`GroupMembership`	a character or factor vector in the same order as the distance data to denote group membership.
`Kmax`	This sets the maximum K that K will increase to stepwise.
`EqualIter`	sets the number of iterations resampling to equal sample size will be carried out.
`SampleSize`	is the sample number that groups will be subsampled to if `Equal` is set to TRUE. The default is set to NA and will therefore use the smallest sample size of the groups provided.
`TieBreaker`	is the method used to break ties if there is no majority resulting from K. Three methods are available('Random', 'Remove' and 'Report'): Random randomly returns one of tied classifications; Remove returns 'UnIDed' for the classification; Report returns a the multiple classifications as a single character string with tied classifications separated by '_'. NOTE: for correct cross-validation proceedures the results of both Report will be considered an incorrect identification even if one of the multiple reported classifications is correct.
`PlotResults`	logical when set to TRUE the results are plotted. When `Equal = TRUE` a polygon is plotted marking the 5th and 9th percentile.

The function is primarily for use with resampling unequal groups to equal sample size a set number of times. This process is carried out with parrallel processing.

This function applies both a weighted approach and an unweighted appraoch and returns both results.

Note that this function is faster when datasets are large and/or when greater numbers of resampling iterations are used. For small samples and few resampling iterations the function is unlikely to be much faster, this is because in addition to the time it takes to carry out calculations the parallel processing will need to compile the results at the end. This process adds additional time to the process.

When the PrintProg is set to TRUE, the progress function of the svMisc package is used.

Returns a matrix of the leave-one-out classifications for all the specimens along with their known classificaiton.

Ardern Hulme-Beaman

ArdernHB/KnnDist documentation built on Feb. 5, 2021, 5:09 a.m.