npdr: npdr

View source: R/npdr.R

npdrR Documentation

npdr

Description

Nearest-Neighbor Projected-Distance Regression (npdr) generalized linear model (GLM) extension of STatistical Inference Relief (STIR) Computes attribute statistical signficance with logistic for case/control and linear model for quantitative outcomes. NPDR allows for categorical (SNP) or numeric (expession) predictor data types. NPDR allows for covariate correction. Observations in the model are projected-distance differences between neighbors.

Usage

npdr(
  outcome,
  dataset,
  regression.type = "binomial",
  attr.diff.type = "numeric-abs",
  nbd.method = "multisurf",
  nbd.metric = "manhattan",
  knn = 0,
  msurf.sd.frac = 0.5,
  covars = "none",
  covar.diff.type = "match-mismatch",
  padj.method = "bonferroni",
  verbose = FALSE,
  use.glmnet = FALSE,
  glmnet.alpha = 1,
  glmnet.lower = 0,
  glmnet.lam = "lambda.1se",
  rm.attr.from.dist = c(),
  neighbor.sampling = "none",
  separate.hitmiss.nbds = FALSE,
  corr.attr.names = NULL,
  fast.reg = FALSE,
  fast.dist = FALSE,
  dopar.nn = FALSE,
  dopar.reg = FALSE,
  unique.dof = FALSE,
  external.dist = NULL
)

Arguments

outcome

character name or length-m numeric outcome vector for linear regression, factor for logistic regression

dataset

m x p matrix of m instances and p attributes, May also include outcome vector but then outcome should be name. Include attr names as colnames.

regression.type

("lm" or "binomial")

attr.diff.type

diff type for attributes ("numeric-abs" or "numeric-sqr" for numeric, "allele-sharing" or "match-mismatch" for SNP). Phenotype diff uses same numeric diff as attr.diff.type when lm regression. For glm-binomial, phenotype diff is "match-mismatch" For correlation data (e.g., rs-fMRI), use "correlation-data"; diffs between two variables (e.g., ROIs) are taken across all their pairs of correlations and the attribute importances are given for the overall variable (e.g,. brain ROI), not individual pairs.

nbd.method

neighborhood method "multisurf" or "surf" (no k) or "relieff" (specify k). Used by nearestNeighbors().

nbd.metric

used in npdrDistances for distance matrix between instances, default: "manhattan" (numeric). Used by nearestNeighbors(). For "precomputed", must specify external.dist matrix.

knn

number of constant nearest hits/misses for "relieff" (fixed-k). Used by nearestNeighbors(). The default knn=0 means use the expected SURF theoretical k with msurf.sd.frac (.5 by default)

msurf.sd.frac

multiplier of the standard deviation from the mean distances; subtracted from mean for SURF or multiSURF. The multiSURF default is msurf.sd.frac=0.5: mean - sd/2. Used by nearestNeighbors().

covars

optional vector or matrix of covariate columns for correction. Or separate data matrix of covariates.

covar.diff.type

string (or string vector) specifying diff type(s) for covariate(s) ("numeric-abs" for numeric or "match-mismatch" for categorical).

padj.method

for p.adjust ("fdr", "bonferroni", ...)

verbose

logical, whether to print out intermediate steps

use.glmnet

logical, whether glmnet is employed

glmnet.alpha

penalty mixture for npdrNET: default alpha=1 (lasso, L1) alpha=0 (ridge, L2)

glmnet.lower

lower limit for coefficients for npdrNET: lower.limits=0 npdrNET default

glmnet.lam

lambda for penalized feature selection. Options: "lambda.1se" (default), "lambda.min" or numeric.

rm.attr.from.dist

attributes for removal (possible confounders) from the distance matrix calculation. Argument for nearestNeighbors. None by default c()

neighbor.sampling

"none" or "unique" if you want to use only unique neighbor pairs (used in nearestNeighbors)

separate.hitmiss.nbds

for case/control data, find neighbors for same (hit) and opposite (miss) classes separately (TRUE) or find nearest neighborhoods before assigning hit/miss groups (FALSE). Uses nearestNeighborsSeparateHitMiss function

corr.attr.names

character vector of p variable names that correspond to the variables used to create the p(p-1) correlation-data predictors. If not specified, integer (1...p) labels used.

fast.reg

logical, whether regression is run with speedlm or speedglm, default as F

fast.dist

whether or not distance is computed by faster algorithm in wordspace, default as F

dopar.nn

logical, whether or not neighborhood is computed in parallel, default as F

dopar.reg

logical, whether or not regression is run in parallel, default as F

unique.dof

use unique neighbor pairs for degrees of freedom. FALSE lets R stats determine regression degrees of freedom

external.dist

optional input distance matrix between samples. Used in conjunction with nbd.metric = "precomputed".

Value

npdr.stats.df: npdr fdr-corrected p-value for each attribute ($pval.adj [1]), raw p-value ($pval.attr [2]), and regression coefficient (beta.attr [3])

Examples

# Data interface options.
# Specify name ("qtrait") of outcome and dataset, 
# which is a data frame including the outcome column.
# ReliefF fixed-k neighborhood, uses surf theoretical default (with msurf.sd.frac=.5) 
# if you do not specify k or let k=0.
npdr.results.df <- npdr(
  "qtrait", qtrait.3sets$train, 
  regression.type = "lm", nbd.method = "relieff", nbd.metric = "manhattan", 
  attr.diff.type = "manhattan", covar.diff.type = "manhattan", 
  msurf.sd.frac = 0.5, padj.method = "bonferroni")

# Specify column index (101) of outcome and dataset, 
# which is a data frame including the outcome column.
# ReliefF fixed-k nbd, choose a k (knn = 10). Or choose msurf.sd.frac
npdr.results.df <- npdr(
  101, case.control.3sets$train, 
  regression.type = "binomial", nbd.method = "relieff", nbd.metric = "manhattan", 
  attr.diff.type = "manhattan", covar.diff.type = "manhattan", 
  knn = 10, padj.method = "bonferroni")

# if outcome vector (pheno.vec) is separate from attribute matrix
# multisurf
pheno.vec <- case.control.3sets$train$class
npdr.results.df <- npdr(
 pheno.vec, predictors.mat,
 regression.type = "binomial", nbd.method = "multisurf", nbd.metric = "manhattan",
 attr.diff.type = "manhattan", covar.diff.type = "manhattan",
 msurf.sd.frac = 0.5, padj.method = "bonferroni"
 )
# attributes with npdr adjusted p-value less than .05
npdr.positives <- row.names(npdr.results.df[npdr.results.df$pva.adj < .05, ]) 

insilico/npdr documentation built on July 6, 2023, 1:14 p.m.