nearestNeighbors2: nearestNeighbors2

View source: R/npdrLearner.R

nearestNeighbors2R Documentation

nearestNeighbors2

Description

Find nearest neighbors of each instance in attr.mat2 (test) to instances in attr.mat1 (train) using relief neighborhood methods. Used by npdrLearner, nearest neighbor classifier. Input data should not include phenotype column.

Usage

nearestNeighbors2(
  attr.mat1,
  attr.mat2,
  nbd.method = "multisurf",
  nbd.metric = "manhattan",
  sd.vec = NULL,
  sd.frac = 0.5,
  dopar.nn = FALSE,
  k = 0
)

Arguments

attr.mat1

m1 x p matrix of m instances and p attributes (training data)

attr.mat2

m2 x p matrix of m instances and p attributes (test data)

nbd.method

neighborhood method: 'multisurf' or 'surf' (no k) or 'relieff' (specify k)

nbd.metric

used in npdrDistances2 for distance matrix between instances, default: 'manhattan' (numeric)

sd.vec

vector of standard deviations

sd.frac

multiplier of the standard deviation from the mean distances, subtracted from mean distance to create for SURF or multiSURF radius. The multiSURF default "dead-band radius" is sd.frac=0.5: mean - sd/2

dopar.nn

whether or not neighborhood is computed in parallel, default as F

k

number of constant nearest hits/misses for 'relieff' (fixed k). The default k=0 means use the expected SURF theoretical k with sd.frac (0.5 by default) for relieff nbd.

Value

list of Ri's (data2 test instances) NN's in data1 (train instances)

Examples

train_dat <- case.control.3sets$train
valid_dat <- case.control.3sets$validation
test.neighbors <- nearestNeighbors2(
  train_dat[, names(train_dat) != "class"],
  valid_dat[, names(valid_dat) != "class"], # no phenotype column
  nbd.method = "relieff",
  nbd.metric = "manhattan",
  sd.vec = NULL, sd.frac = 0.5,
  k = 0, # uses multisurf k estimate
  dopar.nn = FALSE
)

insilico/glmSTIR documentation built on July 7, 2023, 12:29 a.m.