| ntp | R Documentation | 
Nearest Template Prediction (NTP) based on predefined class templates.
ntp(
  emat,
  templates,
  nPerm = 1000,
  distance = "cosine",
  nCores = 1,
  seed = NULL,
  verbose = getOption("verbose"),
  doPlot = FALSE
)
| emat | a numeric matrix with row features and sample columns.
 | 
| templates | a data frame with two columns; class (coerced to factor) and probe (coerced to character). | 
| nPerm | an integer, number of permutations for  | 
| distance | a character, either c("cosine", "pearson", "spearman" or "kendall"). | 
| nCores | an integer specifying number of threads for parallelization. | 
| seed | an integer, for  | 
| verbose | logical, whether console messages are to be displayed. | 
| doPlot | logical, whether to produce prediction  | 
ntp implements the Nearest Template Prediction (NTP)
algorithm, largely as proposed by Yujin Hoshida (2010) (see below). For each
sample, distances to templates are calculated and class assigned based on
smallest distance. Distances are transformed from the sample-templates
correlations as follows:
d.class = \sqrt(1/2 * (1-(cor(sample,templates))
Template values are 1 for class features and 0 for non-class features (-1 if
there are only two classes). Prediction confidence is estimated based on
the distance of the null-distribution, estimated from permutation tests.
Thus the lowest possible estimate of the p-value is 1/nPerm.
emat should be a row-wise centered and scaled matrix.
For large, balanced datasets, this may be achieved by applying
ematAdjust function.
templates is a data.frame defining class templates. A class
template is a set of marker genes with higher expected expression in
samples belonging to class compared to non-class samples. templates
must contain at least two columns named probe and class.
compared to Hoshida (2010), resulting p-value estimates are more
conservative (by a factor equaling the number of classes) and
the distances are a monotonic transformation of 1-cor (see
Details section above).
Hoshida (2010) does not explicitly state whether input should be log2-transformed or not and examples includes both. Based on experience this choice affects results only at the margins, but for high-quality datasets, normalized, untransformed inputs may yield a small increase in accuracy.
For further details on the NTP algorithm, please refer to package vignette and Hoshida (2010).
Parallel processing is implemented through parallel
mclapply or snow parLapply
for nix and Windows systems, respectively.
a data frame with class predictions, template distances,
p-values and false discovery rate adjusted p-values
(p.adjust(method = "fdr")). Rownames equal emat
colnames.
features with missing values are discarded.
setting seed disables parallel processing to ensure p-value
reproducibility.
for two random uncorrelated vectors x,y N\sim(0,1)
E[d.xy]\approx0.71 when distance is cosine.
internally, correlations instead of distances are calculated.
accepts reuse of features (marker not specific for one class only)
Hoshida, Y. (2010). Nearest Template Prediction: A Single-Sample-Based Flexible Class Prediction with Confidence Assessment. PLoS ONE 5, e15543.
Eide PW, Bruun J, Lothe RA, Sveen A. (2017). CMScaller: an R package for consensus molecular subtyping of colorectal cancer pre-clinical models. doi: 10.1038/s41598-017-16747-x.
corCosine, cor
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.