nearestNeighborsSeparateHitMiss: nearestNeighborsSeparateHitMiss

View source: R/nearestNeighbors.R

nearestNeighborsSeparateHitMissR Documentation

nearestNeighborsSeparateHitMiss

Description

Find nearest neighbors of each instance using relief.method. Treat the hit and miss distributions separately to circument potential hit bias. ReliefF version makes hit/miss neighborhoods balanced. Surf and MultiSurf are still imbalanced. Used for npdr (no hits or misses specified in neighbor function).

Usage

nearestNeighborsSeparateHitMiss(
  attr.mat,
  pheno.vec,
  nbd.method = "relieff",
  nbd.metric = "manhattan",
  sd.frac = 0.5,
  k = 0,
  neighbor.sampling = "none",
  att_to_remove = c(),
  fast.dist = FALSE,
  dopar.nn = FALSE
)

Arguments

attr.mat

m x p matrix of m instances and p attributes

pheno.vec

vector of class values for m instances

nbd.method

neighborhood method "multisurf" or "surf" (no k) or "relieff" (specify k)

nbd.metric

used in npdrDistances for distance matrix between instances, default: "manhattan" (numeric)

sd.frac

multiplier of the standard deviation from the mean distances, subtracted from mean distance to create for SURF or multiSURF radius. The multiSURF default "dead-band radius" is sd.frac=0.5: mean - sd/2

k

number of constant nearest hits/misses for "relieff" (fixed k). The default k=0 means use the expected SURF theoretical k with sd.frac (.5 by default) for relieff nbd.

neighbor.sampling

"none" or "unique" if you want to return only unique neighbor pairs

att_to_remove

attributes for removal (possible confounders) from the distance matrix calculation.

fast.dist

whether or not distance is computed by faster algorithm in wordspace, default as F

dopar.nn

whether or not neighborhood is computed in parallel, default as F

Value

Ri_NN.idxmat, matrix of Ri's (first column) and their NN's (second column)

Examples

# reliefF (fixed-k) neighborhood using default k equal to theoretical surf expected value
# One can change the theoretical value by changing sd.frac (default 0.5)
neighbor.pairs.idx <- nearestNeighborsSeparateHitMiss(
  predictors.mat, case.control.3sets$train$class, # need attributes and pheno
  nbd.method = "relieff", nbd.metric = "manhattan",
  sd.frac = .5, k = 0
)
head(neighbor.pairs.idx)

insilico/glmSTIR documentation built on July 7, 2023, 12:29 a.m.