nearestNeighbors2: nearestNeighbors2
In insilico/glmSTIR: Nearest-neighbor Projected-Distance Regression

nearestNeighbors2

R Documentation

nearestNeighbors2

Description

Find nearest neighbors of each instance in attr.mat2 (test) to instances in attr.mat1 (train) using relief neighborhood methods. Used by npdrLearner, nearest neighbor classifier. Input data should not include phenotype column.

Usage

nearestNeighbors2(
  attr.mat1,
  attr.mat2,
  nbd.method = "multisurf",
  nbd.metric = "manhattan",
  sd.vec = NULL,
  sd.frac = 0.5,
  dopar.nn = FALSE,
  k = 0
)

Arguments

`attr.mat1`	m1 x p matrix of m instances and p attributes (training data)
`attr.mat2`	m2 x p matrix of m instances and p attributes (test data)
`nbd.method`	neighborhood method: 'multisurf' or 'surf' (no k) or 'relieff' (specify k)
`nbd.metric`	used in npdrDistances2 for distance matrix between instances, default: 'manhattan' (numeric)
`sd.vec`	vector of standard deviations
`sd.frac`	multiplier of the standard deviation from the mean distances, subtracted from mean distance to create for SURF or multiSURF radius. The multiSURF default "dead-band radius" is sd.frac=0.5: mean - sd/2
`dopar.nn`	whether or not neighborhood is computed in parallel, default as F
`k`	number of constant nearest hits/misses for 'relieff' (fixed k). The default k=0 means use the expected SURF theoretical k with sd.frac (0.5 by default) for relieff nbd.

Value

list of Ri's (data2 test instances) NN's in data1 (train instances)

Examples

train_dat <- case.control.3sets$train
valid_dat <- case.control.3sets$validation
test.neighbors <- nearestNeighbors2(
  train_dat[, names(train_dat) != "class"],
  valid_dat[, names(valid_dat) != "class"], # no phenotype column
  nbd.method = "relieff",
  nbd.metric = "manhattan",
  sd.vec = NULL, sd.frac = 0.5,
  k = 0, # uses multisurf k estimate
  dopar.nn = FALSE
)

insilico/glmSTIR documentation built on July 7, 2023, 12:29 a.m.