npdr | R Documentation |
Nearest-Neighbor Projected-Distance Regression (npdr) generalized linear model (GLM) extension of STatistical Inference Relief (STIR) Computes attribute statistical signficance with logistic for case/control and linear model for quantitative outcomes. NPDR allows for categorical (SNP) or numeric (expession) predictor data types. NPDR allows for covariate correction. Observations in the model are projected-distance differences between neighbors.
npdr(
outcome,
dataset,
regression.type = "binomial",
attr.diff.type = "numeric-abs",
nbd.method = "multisurf",
nbd.metric = "manhattan",
knn = 0,
msurf.sd.frac = 0.5,
covars = "none",
covar.diff.type = "match-mismatch",
padj.method = "bonferroni",
verbose = FALSE,
use.glmnet = FALSE,
glmnet.alpha = 1,
glmnet.lower = 0,
glmnet.lam = "lambda.1se",
rm.attr.from.dist = c(),
neighbor.sampling = "none",
separate.hitmiss.nbds = FALSE,
corr.attr.names = NULL,
fast.reg = FALSE,
fast.dist = FALSE,
dopar.nn = FALSE,
dopar.reg = FALSE,
unique.dof = FALSE,
external.dist = NULL
)
outcome |
character name or length-m numeric outcome vector for linear regression, factor for logistic regression |
dataset |
m x p matrix of m instances and p attributes, May also include outcome vector but then outcome should be name. Include attr names as colnames. |
regression.type |
( |
attr.diff.type |
diff type for attributes ( |
nbd.method |
neighborhood method |
nbd.metric |
used in npdrDistances for distance matrix between instances, default: |
knn |
number of constant nearest hits/misses for |
msurf.sd.frac |
multiplier of the standard deviation from the mean distances; subtracted from mean for SURF or multiSURF. The multiSURF default is msurf.sd.frac=0.5: mean - sd/2. Used by nearestNeighbors(). |
covars |
optional vector or matrix of covariate columns for correction. Or separate data matrix of covariates. |
covar.diff.type |
string (or string vector) specifying diff type(s) for covariate(s) ( |
padj.method |
for p.adjust ( |
verbose |
logical, whether to print out intermediate steps |
use.glmnet |
logical, whether glmnet is employed |
glmnet.alpha |
penalty mixture for npdrNET: default alpha=1 (lasso, L1) alpha=0 (ridge, L2) |
glmnet.lower |
lower limit for coefficients for npdrNET: lower.limits=0 npdrNET default |
glmnet.lam |
lambda for penalized feature selection. Options: |
rm.attr.from.dist |
attributes for removal (possible confounders) from the distance matrix calculation. Argument for nearestNeighbors. None by default c() |
neighbor.sampling |
"none" or |
separate.hitmiss.nbds |
for case/control data, find neighbors for same (hit) and opposite (miss) classes separately (TRUE) or find nearest neighborhoods before assigning hit/miss groups (FALSE). Uses nearestNeighborsSeparateHitMiss function |
corr.attr.names |
character vector of p variable names that correspond to the variables used to create the p(p-1) correlation-data predictors. If not specified, integer (1...p) labels used. |
fast.reg |
logical, whether regression is run with speedlm or speedglm, default as F |
fast.dist |
whether or not distance is computed by faster algorithm in wordspace, default as F |
dopar.nn |
logical, whether or not neighborhood is computed in parallel, default as F |
dopar.reg |
logical, whether or not regression is run in parallel, default as F |
unique.dof |
use unique neighbor pairs for degrees of freedom. FALSE lets R stats determine regression degrees of freedom |
external.dist |
optional input distance matrix between samples. Used in conjunction with nbd.metric = |
npdr.stats.df: npdr fdr-corrected p-value for each attribute ($pval.adj [1]), raw p-value ($pval.attr [2]), and regression coefficient (beta.attr [3])
# Data interface options.
# Specify name ("qtrait") of outcome and dataset,
# which is a data frame including the outcome column.
# ReliefF fixed-k neighborhood, uses surf theoretical default (with msurf.sd.frac=.5)
# if you do not specify k or let k=0.
npdr.results.df <- npdr(
"qtrait", qtrait.3sets$train,
regression.type = "lm", nbd.method = "relieff", nbd.metric = "manhattan",
attr.diff.type = "manhattan", covar.diff.type = "manhattan",
msurf.sd.frac = 0.5, padj.method = "bonferroni")
# Specify column index (101) of outcome and dataset,
# which is a data frame including the outcome column.
# ReliefF fixed-k nbd, choose a k (knn = 10). Or choose msurf.sd.frac
npdr.results.df <- npdr(
101, case.control.3sets$train,
regression.type = "binomial", nbd.method = "relieff", nbd.metric = "manhattan",
attr.diff.type = "manhattan", covar.diff.type = "manhattan",
knn = 10, padj.method = "bonferroni")
# if outcome vector (pheno.vec) is separate from attribute matrix
# multisurf
pheno.vec <- case.control.3sets$train$class
npdr.results.df <- npdr(
pheno.vec, predictors.mat,
regression.type = "binomial", nbd.method = "multisurf", nbd.metric = "manhattan",
attr.diff.type = "manhattan", covar.diff.type = "manhattan",
msurf.sd.frac = 0.5, padj.method = "bonferroni"
)
# attributes with npdr adjusted p-value less than .05
npdr.positives <- row.names(npdr.results.df[npdr.results.df$pva.adj < .05, ])
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.