kNN.confusionmatrix: Confusion Matrices through k Nearest Neighbours...
In hkauhanen/pbcm: Parametric Bootstrap Cross-Fitting Method

Description Usage Arguments Details Value Author(s) See Also Examples

View source: R/kNN.confusionmatrix.R

Computes confusion matrices (one for each value of k) using k-NN classification from the results of two parametric bootstraps, one of these being labelled a holdout set and tested against the other one.

kNN.confusionmatrix(
  df,
  df.holdout,
  k,
  ties = "model2",
  print_genargs = TRUE,
  verbose = TRUE
)

`df`	Data frame output by `pbcm.di` or `pbcm.du`
`df.holdout`	Data frame output by `pbcm.di` or `pbcm.du`
`k`	Number of neighbours to consider in k-NN classification; may be a vector of integers
`ties`	Which way to break ties in k-NN classification (see `kNN.classification`)
`print_genargs`	Should the generator arguments of the holdout distribution be included in the output? (See Details)
`verbose`	If `TRUE`, prints a progress bar and issues warnings

The function takes each DeltaGoF value from df.holdout, compares it against the DeltaGoF distributions in df, and decides based on k-NN classification. By convention, we take model 2 as the null hypothesis and model 1 as the alternative. Hence a false positive, for instance, means the situation where model 2 generated the data but the decision was in favour of model 1.

A data frame with the following columns:

k: Number of nearest neighbours
P: Number of positives
N: Number of negatives
TP: Number of true positives
FP: Number of false positives
TN: Number of true negatives
FN: Number of false negatives
alpha: Type I error (false positive) rate; equal to FP divided by N
beta: Type II error (false negative) rate; equal to FN divided by P

In addition to these columns, if print_genargs == TRUE, each argument that was passed via genargs1 and genargs2 to pbcm.di or pbcm.du to generate df.holdout is included as a column of its own.

Henri Kauhanen

kNN.classification, pbcm.di, pbcm.du

x <- seq(from=0, to=1, length.out=100)
mockdata <- data.frame(x=x, y=x + rnorm(100, 0, 0.5))

myfitfun <- function(data, p) {
  res <- nls(y~a*x^p, data, start=list(a=1.1))
  list(a=coef(res), GoF=deviance(res))
}

mygenfun <- function(model, p) { 
  x <- seq(from=0, to=1, length.out=100)
  y <- model$a*x^p + rnorm(100, 0, 0.5)
  data.frame(x=x, y=y)
}

pb1 <- pbcm.di(data=mockdata, fun1=myfitfun, fun2=myfitfun, genfun1=mygenfun,
        genfun2=mygenfun, reps=20, args1=list(p=1), args2=list(p=2), 
        genargs1=list(p=1), genargs2=list(p=2))

pb2 <- pbcm.di(data=mockdata, fun1=myfitfun, fun2=myfitfun, genfun1=mygenfun,
        genfun2=mygenfun, reps=20, args1=list(p=1), args2=list(p=2), 
        genargs1=list(p=1), genargs2=list(p=2))

kNN.confusionmatrix(df=pb1, df.holdout=pb2, k=1:10)