npdr: npdr
In insilico/glmSTIR: Nearest-neighbor Projected-Distance Regression

View source: R/npdr.R

npdr	R Documentation

npdr

Description

Nearest-Neighbor Projected-Distance Regression (npdr) generalized linear model (GLM) extension of STatistical Inference Relief (STIR) Computes attribute statistical signficance with logistic for case/control and linear model for quantitative outcomes. NPDR allows for categorical (SNP) or numeric (expession) predictor data types. NPDR allows for covariate correction. Observations in the model are projected-distance differences between neighbors.

Usage

npdr(
  outcome,
  dataset,
  regression.type = "binomial",
  attr.diff.type = "numeric-abs",
  nbd.method = "multisurf",
  nbd.metric = "manhattan",
  knn = 0,
  msurf.sd.frac = 0.5,
  covars = "none",
  covar.diff.type = "match-mismatch",
  padj.method = "bonferroni",
  verbose = FALSE,
  use.glmnet = FALSE,
  glmnet.alpha = 1,
  glmnet.lower = 0,
  glmnet.lam = "lambda.1se",
  rm.attr.from.dist = c(),
  neighbor.sampling = "none",
  separate.hitmiss.nbds = FALSE,
  corr.attr.names = NULL,
  fast.reg = FALSE,
  fast.dist = FALSE,
  dopar.nn = FALSE,
  dopar.reg = FALSE,
  unique.dof = FALSE,
  external.dist = NULL
)

Arguments

`outcome`	character name or length-m numeric outcome vector for linear regression, factor for logistic regression
`dataset`	m x p matrix of m instances and p attributes, May also include outcome vector but then outcome should be name. Include attr names as colnames.
`regression.type`	(`"lm"` or `"binomial"`)
`attr.diff.type`	diff type for attributes (`"numeric-abs"` or `"numeric-sqr"` for numeric, `"allele-sharing"` or `"match-mismatch"` for SNP). Phenotype diff uses same numeric diff as attr.diff.type when lm regression. For glm-binomial, phenotype diff is `"match-mismatch"` For correlation data (e.g., rs-fMRI), use `"correlation-data"`; diffs between two variables (e.g., ROIs) are taken across all their pairs of correlations and the attribute importances are given for the overall variable (e.g,. brain ROI), not individual pairs.
`nbd.method`	neighborhood method `"multisurf"` or `"surf"` (no k) or `"relieff"` (specify k). Used by nearestNeighbors().
`nbd.metric`	used in npdrDistances for distance matrix between instances, default: `"manhattan"` (numeric). Used by nearestNeighbors(). For `"precomputed"`, must specify external.dist matrix.
`knn`	number of constant nearest hits/misses for `"relieff"` (fixed-k). Used by nearestNeighbors(). The default knn=0 means use the expected SURF theoretical k with msurf.sd.frac (.5 by default)
`msurf.sd.frac`	multiplier of the standard deviation from the mean distances; subtracted from mean for SURF or multiSURF. The multiSURF default is msurf.sd.frac=0.5: mean - sd/2. Used by nearestNeighbors().
`covars`	optional vector or matrix of covariate columns for correction. Or separate data matrix of covariates.
`covar.diff.type`	string (or string vector) specifying diff type(s) for covariate(s) (`"numeric-abs"` for numeric or `"match-mismatch"` for categorical).
`padj.method`	for p.adjust (`"fdr"`, `"bonferroni"`, ...)
`verbose`	logical, whether to print out intermediate steps
`use.glmnet`	logical, whether glmnet is employed
`glmnet.alpha`	penalty mixture for npdrNET: default alpha=1 (lasso, L1) alpha=0 (ridge, L2)
`glmnet.lower`	lower limit for coefficients for npdrNET: lower.limits=0 npdrNET default
`glmnet.lam`	lambda for penalized feature selection. Options: `"lambda.1se"` (default), `"lambda.min"` or numeric.
`rm.attr.from.dist`	attributes for removal (possible confounders) from the distance matrix calculation. Argument for nearestNeighbors. None by default c()
`neighbor.sampling`	"none" or `"unique"` if you want to use only unique neighbor pairs (used in nearestNeighbors)
`separate.hitmiss.nbds`	for case/control data, find neighbors for same (hit) and opposite (miss) classes separately (TRUE) or find nearest neighborhoods before assigning hit/miss groups (FALSE). Uses nearestNeighborsSeparateHitMiss function
`corr.attr.names`	character vector of p variable names that correspond to the variables used to create the p(p-1) correlation-data predictors. If not specified, integer (1...p) labels used.
`fast.reg`	logical, whether regression is run with speedlm or speedglm, default as F
`fast.dist`	whether or not distance is computed by faster algorithm in wordspace, default as F
`dopar.nn`	logical, whether or not neighborhood is computed in parallel, default as F
`dopar.reg`	logical, whether or not regression is run in parallel, default as F
`unique.dof`	use unique neighbor pairs for degrees of freedom. FALSE lets R stats determine regression degrees of freedom
`external.dist`	optional input distance matrix between samples. Used in conjunction with nbd.metric = `"precomputed"`.

Value

npdr.stats.df: npdr fdr-corrected p-value for each attribute ($pval.adj [1]), raw p-value ($pval.attr [2]), and regression coefficient (beta.attr [3])

Examples

# Data interface options.
# Specify name ("qtrait") of outcome and dataset, 
# which is a data frame including the outcome column.
# ReliefF fixed-k neighborhood, uses surf theoretical default (with msurf.sd.frac=.5) 
# if you do not specify k or let k=0.
npdr.results.df <- npdr(
  "qtrait", qtrait.3sets$train, 
  regression.type = "lm", nbd.method = "relieff", nbd.metric = "manhattan", 
  attr.diff.type = "manhattan", covar.diff.type = "manhattan", 
  msurf.sd.frac = 0.5, padj.method = "bonferroni")

# Specify column index (101) of outcome and dataset, 
# which is a data frame including the outcome column.
# ReliefF fixed-k nbd, choose a k (knn = 10). Or choose msurf.sd.frac
npdr.results.df <- npdr(
  101, case.control.3sets$train, 
  regression.type = "binomial", nbd.method = "relieff", nbd.metric = "manhattan", 
  attr.diff.type = "manhattan", covar.diff.type = "manhattan", 
  knn = 10, padj.method = "bonferroni")

# if outcome vector (pheno.vec) is separate from attribute matrix
# multisurf
pheno.vec <- case.control.3sets$train$class
npdr.results.df <- npdr(
 pheno.vec, predictors.mat,
 regression.type = "binomial", nbd.method = "multisurf", nbd.metric = "manhattan",
 attr.diff.type = "manhattan", covar.diff.type = "manhattan",
 msurf.sd.frac = 0.5, padj.method = "bonferroni"
 )
# attributes with npdr adjusted p-value less than .05
npdr.positives <- row.names(npdr.results.df[npdr.results.df$pva.adj < .05, ])

insilico/glmSTIR documentation built on July 7, 2023, 12:29 a.m.