Dimodal: Detect modality in the spacing of data.

View source: R/Dimodal.R

DimodalR Documentation

Detect modality in the spacing of data.

Description

Dimodal studies the modality of data using its spacing. The presence of peaks or local increases in it indicates the data is multi-modal and locates the anti-modes. Flats or consistent spacing cover the modes. Dimodal finds these features after smoothing the spacing by low-pass filtering, which supports discrete or heavily quantized data, or in the interval spacing. Several tests, using parametric models, runs, and bootstrap sampling, evaluate these features.

Usage

Dimodal(x, opt=Diopt())
## S3 method for class 'Dimodal'
print(x, feature=c('peaks', 'flats'), ...)
## S3 method for class 'Dimodal'
summary(object, feature=c('peaks', 'flats'), ...)
## S3 method for class 'Dimodal'
plot(x, show=c('lp', 'histogram', 'diw'),
     feature=c('peaks', 'flats'), opt=Diopt(), ...)

Arguments

x

for Dimodal the (numeric) data vector to analyze; for the methods an object of class ⁠"Dimodal"⁠

object

an object of class ⁠"Dimodal"⁠

opt

local version of options to guide analysis

feature

display only the indicated feature(s) in all methods that were run, or for plots mark only them in the graph

show

plot the low-pass spacing, a histogram of the raw data, and/or the interval spacing, in separate graphs in the order given

...

extra arguments, ignored for all methods

Details

Changes in the spacing of data can indicate a change in its modality, and Dimodal is a general interface to feature detectors and tests to evaluate such changes. Spacing, the difference between consecutive order statistics or the delta after sorting the data, takes on a ‘U’ form, increasing rapidly in the tails and remaining stable in the center (for single-sided variates it forms half the U; uniform variates have constant spacing). The transition between modes is marked by local increases in the spacing while the center of modes see stable values. Dimodal therefore looks for local maxima or peaks in the spacing, or locally flat regions.

The spacing, designated Di, is often very noisy, and may be quantized to a few values if the data is discrete or taken with limited precision. Smoothing is necessary, which Dimodal can do either by apply a low-pass (lp) filter or by taking the difference over more than one order statistic. The latter is called the interval spacing Diw and is generated as a difference with lag; it is equivalent to a running mean or rectangular filter of the raw spacing. The recommended low-pass filter is a Kaiser kernel, which offers good high-frequency suppression and main lobe width; other available filters are the Bartlett or triangular (synonyms), Hanning, Hamming, Gaussian or normal (synonyms), and Blackman. Filtering is done by convolving the data with the filter's kernel, rather than moving to the Fourier domain. Points at the start and finish that are partially covered by the kernel or interval are set to NA and attributes attached to the data give the valid range. Indexing from the two spacings is different. The low-pass kernel is centered, with partial overlaps at both ends. The interval spacing is defined as trailing from the upper index, which runs to the end of the data, so the partial overlap occurs only at the start. This will be seen in the position of the smoothed curves when plotting results and the shift in indices needed to align the two schemes will be printed with the data summary. The raw values corresponding to a feature automatically compensate for the difference.

The feature detectors find.peaks and find.flats have separate help pages describing their algorithms and the parameters that control their analysis. These features are local and therefore not only indicate whether data may be multi-modal, but provide the location of the modes and the transitions between them.

Dimodal uses three main strategies to evaluate the features. First, the models tests are Dipeak.test and Diflat.test, with critical values at a significance level also available. These models are based on simulations of the peak heights and flat lengths in a univariate null distribution and offer a parametric assessment of their significance. They are less conservative than other modality detectors. Second, the bootstrap test is Diexcurht.test. The bootstrap simulates the features drawing from a pool of the difference of the spacing, estimating their probability without assuming any underlying distribution. Finally, the runs tests are Dinrun.test, Dirunlen.test, and Dipermht.test. Quantizing the filtered spacing into a few levels by taking the sign of the difference (in other words, if the signal is increasing, decreasing, or constant) allows us to consider runs in the symbols. We can test how many there are, or the longest, or if a permutation of them recreates the feature.

A fourth strategy, using changepoint detectors on the raw spacing to detect transitions between modes and anti-modes, is not included in this version of Dimodal. See the package help page or DESCRIPTION file for the location of the full version.

The bootstrap test extends a peak to its support, defined by the ⁠"peak.fhsupp"⁠ option, a fraction of the peak's height. A value of 0.9 is enough to back the away from minima placed in a long flat while not distorting the peak's width if the minima are well-defined. 0.5 corresponds to Full Width at Half Maximum (FWHM), and 1.0 extends the peak to the minima.

The analysis of each feature is gathered into separate S3 class objects which support printing and marking plots. The generic functions on the Dimodal result route to these objects if they are selected by the features argument. A plot may contain the filtered spacing or interval spacing plus a histogram of the raw data, with features annotated on each. It uses layout to create a row of the shown graphs, as specified by the show argument. The histogram annotations will come from the first, leftmost, spacing shown.

The raw data must be numeric or integer. Non-finite values, including NA, will be dropped.

Dimodal needs a complete list of options for the opt argument. Do not make changes in the call, as Diopt will return only the changed values. Use Diopt.local instead.

The option ⁠"analysis"⁠ controls which smoothed spacing to generate, one or both of 'lp' and 'diw'. If none of these are specified the data will contain only the spacing and mid-quantile function, without any features or their analysis.

Dimodal uses options ⁠"lp.param"⁠ and ⁠"diw.param"⁠ to override the detector options for each method, and ⁠"lp.tests"⁠ and ⁠"diw.tests"⁠ to determine which feature tests to carry out. If these are empty lists then the data will contain the smoothed spacing but there will be no features. While generating the data it uses options ⁠"lp.kernel"⁠ and ⁠"lp.window"⁠ to set up the low-pass filter, and ⁠"diw.window"⁠ for the interval width. It uses ⁠"excur.ntop"⁠ when creating the base set of draws for excursion tests. Option ⁠"data.midq"⁠ determines the approximation method (type argument to the midquantile function), when converting indices in the spacing back to order statistics.

The default values of the detector options come from the development of the low-pass models. We do not know how different values will affect the models. The interval spacing is much rougher than low-pass filtering, which may require looser ripple and height parameters to find any flat, or reduce the number of peaks. The excursion tests will accommodate this.

Value

A list assigned to class ⁠"Dimodal"⁠ with elements

data

an object of class ⁠"Didata"⁠ with all data used in the analysis

lp.peaks

an object of class ⁠"Dipeak"⁠ capturing the local extrema in the low-pass spacing and their evaluation, with test results and raw data locations added to the features from find.peaks

lp.flats

an object of class ⁠"Diflat"⁠ capturing the local flats in the low-pass spacing and their evaluation, with test results and raw data locations added to the features from find.flats

diw.peaks

an object of class ⁠"Dipeak"⁠ containing the local extrema in the interval spacing and their evaluation, with test results and raw data locations added to the features from find.peaks

diw.flats

an object of class ⁠"Diflat"⁠ capturing the local flats in the interval spacing and their evaluation, with test results and raw data locations added to the features from find.flats

opt

the list passed as the opt argument, per Diopt

These elements will have empty data structures if the analysis is not run.

Dimodal will automatically call shiftID.place on each detector's results and will summarize the tests, as described with each data class. Dimodal adds an attribute ⁠"source"⁠ to each of the features, with value LP, Diw, or Di.

See Also

Diopt for the parameters controlling the analysis.

find.peaks, find.flats for feature detection.

Dipeak.test, Diflat.test for parametric models to evaluate the features, Diexcurht.test for a bootstrap test of feature significance, Dinrun.test, Dirunlen.test for tests of runs (here for sequences in the sign of the difference in the interval spacing), and Dipermht.test for a permutation test of the runs making a feature.

Didata, Dipeak, Diflat for the data structures generated by the feature detectors and their evaluation.

center.diw to further shift the position of interval spacing features to the middle of the interval to align with low-pass features.

match.features to identify common peaks and flats in both spacings.

shiftID.place to move indices in either spacing to the original data grid and add the corresponding raw values.

midquantile for the mid-quantile mapping from index to raw data.

Examples

## The interval spacing is noisy with the default options, so require a
## larger peak height with a temporary value to Diopt.
oldopt <- Diopt(diw.param=list(peak.fht=0.125))
## Run the analysis.
m <- Dimodal(faithful$waiting)
## If printing the results, the interval spacing peaks have a probability
## just under 0.05 but fail the acceptance levels.
summary(m)
## Details about the peaks in both spacings.
print(m, feature="peaks")
## We find one peak in both spacings, but only the low-pass is significant.
match.features(m)
## Three plots side by side.  The limited resolution of the data is clear
## in the interval spacing.
dev.new(width=12, height=4) ; plot(m)
## Restore the old option values.  Diopt(NULL) returns to defaults.
oldopt <- Diopt(oldopt)

Dimodal documentation built on May 2, 2026, 1:06 a.m.