dipps: Difference in ProPortions Statistics (DIPPS)

Description Usage Arguments Details Value See Also Examples

View source: R/dipps.R

Description

Calculates the DIPPS for the given subset. The argument descriptions are generic as DIPPS can be applied to any binary (“occurrence”) data in which each variable has two values (“occurrence” and “absence”). In the MSI context, an occurrence is generally taken to be a peak, an observation is generally taken to be a spectrum and a variable is generally taken to be a mass range or peakgroup, possibly grouped via some clustering method such as that offered by dbscan.

Usage

1
dipps(obs, var, subset)

Arguments

obs

A vector identifying the observation from which an occurrence originated.

var

A vector identifying the variable of which an occurrence is a realisation.

subset

A vector identifying occurrences belonging to the subset of observations of interest.

Details

obs, var, and subset must be equal length, and can be taken from the output of combine_peaklists with relative ease – see example below. It is also assumed that equal entries in obs should have equal entries in subset as well. TODO: I should add a check for that.

Note that from the perspective of treating occurrence in each variable (seperately) being used as a binary classifier for membership in the subset, the DIPPS can be thought of as the Informedness of these classifiers, i.e. the DIPPS = sensitivity + specificity - 1.

Value

Successful completion will return a data.frame in which rows represent variables (as identified by var), ordered in decreasing order of DIPPS, and with seven columns:

See Also

combine_peaklists, dbscan,

Winderbaum, L. J. et al. Feature extraction for proteomics imaging mass spectrometry data. The Annals of Applied Statistics. 2015;9(4):1973-1996. doi: 10.1214/15-AOAS870.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
i.path = system.file("extdata", "test1", package = "dipps")
n.empty = combine_peaklists(i.path)
o.name = basename(i.path)
df.spec = load_speclist(o.name)
df.peak = load_peaklist(o.name)

# Construct peakgroups
df.peak$group = dbscan(df.peak$m.z, eps = 0.1, mnpts = 1)

# Select a subset of spectra expected to be overexpressed. In this case
# spectra with Y-coordinate greater than or equal to 170.
df.spec$subset = df.spec$Y >= 170
df.peak = merge(df.peak, df.spec[, c("Acq", "subset")])

# Calculate DIPPS
df.dipps = dipps(df.peak$Acq, df.peak$group, df.peak$subset)

Armadilloa16/dipps documentation built on May 5, 2019, 7:06 a.m.