Description Usage Arguments Details Value See Also Examples
Calculates the DIPPS for the given subset. The argument descriptions are
generic as DIPPS can be applied to any binary (“occurrence”) data in which
each variable has two values (“occurrence” and “absence”). In the MSI
context, an occurrence is generally taken to be a peak, an observation is
generally taken to be a spectrum and a variable is generally taken to be
a mass range or peakgroup, possibly grouped via some clustering method such
as that offered by dbscan
.
1 |
obs |
A vector identifying the observation from which an occurrence originated. |
var |
A vector identifying the variable of which an occurrence is a realisation. |
subset |
A vector identifying occurrences belonging to the subset of observations of interest. |
obs
, var
, and subset
must be equal length, and can be
taken from the output of combine_peaklists
with relative ease – see
example below. It is also assumed that equal entries in obs
should
have equal entries in subset
as well. TODO: I should add a check for
that.
Note that from the perspective of treating occurrence in each variable (seperately) being used as a binary classifier for membership in the subset, the DIPPS can be thought of as the Informedness of these classifiers, i.e. the DIPPS = sensitivity + specificity - 1.
Successful completion will return a data.frame in which rows
represent variables (as identified by var
), ordered in decreasing
order of DIPPS, and with seven columns:
var
.
p.u: proportions of occurrence in the subset == TRUE
subset of
observations.
p.d: proportions of occurrence in the subset == FALSE
subset
of observations.
d: p.u - p.d (DIPPS).
c.u: the cosine distance centroid of the subset == TRUE
subset
of observations.
cos: the cosine distance between c.u and the ‘template’ vector t which contains ones in each peakgroup with a DIPPS equal to or greater than the DIPPS of the peakgroup the corresponding row represents.
t: the ‘template’ vector for the heuristically chosen ‘optimal’ DIPPS cutoff – i.e. selecting a number of the highest DIPPS variables such that the cosine distance as described above is minimised, under the contraint that the dipps cutoff should be positive.
Winderbaum, L. J. et al. Feature extraction for proteomics imaging mass spectrometry data. The Annals of Applied Statistics. 2015;9(4):1973-1996. doi: 10.1214/15-AOAS870.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | i.path = system.file("extdata", "test1", package = "dipps")
n.empty = combine_peaklists(i.path)
o.name = basename(i.path)
df.spec = load_speclist(o.name)
df.peak = load_peaklist(o.name)
# Construct peakgroups
df.peak$group = dbscan(df.peak$m.z, eps = 0.1, mnpts = 1)
# Select a subset of spectra expected to be overexpressed. In this case
# spectra with Y-coordinate greater than or equal to 170.
df.spec$subset = df.spec$Y >= 170
df.peak = merge(df.peak, df.spec[, c("Acq", "subset")])
# Calculate DIPPS
df.dipps = dipps(df.peak$Acq, df.peak$group, df.peak$subset)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.