pmp: Pan-Matrix Profile

View source: R/pmp.R

pmpR Documentation

Pan-Matrix Profile

Description

Computes the Pan-Matrix Profile (PMP) for the given time series.

Usage

pmp(
  data,
  window_sizes = seq.int(from = 10, to = length(data)/2, length.out = 20),
  plot = FALSE,
  pmp_obj = NULL,
  n_workers = 1,
  verbose = getOption("tsmp.verbose", 2)
)

Arguments

data

a matrix or a vector of numeric.

window_sizes

a vector of the window sizes that will be evaluated. They will be rounded to the lower integer and sorted. (Default is a sequence of 20 values from 10 to half data size).

plot

a logical. If TRUE, every new computation will be plotted. (Default is FALSE).

pmp_obj

a PMP object that may or not contain an upper bound value, and previous computed profiles. The function will add new profiles, not replace. (Default is NULL).

n_workers

an int. Number of workers for parallel. (Default is 1).

verbose

an int. See details. (Default is 2).

Details

The work closest in spirit to ours is VALMOD. The idea of VALMOD is to compute the MP for the shortest length of interest, then use the information gleaned from it to guide a search through longer subsequence lengths, exploiting lower bounds to prune off some calculations. This idea works well for the first few of the longer subsequence lengths, but the lower bounds progressively weaken, making the pruning ineffective. Thus, in the five case studies they presented, the mean value of U/L was just 1.24. In contrast, consider that our termite example in Fig. 15 has a U/L ratio of 240, more than two orders of magnitude larger. Thus, VALMOD is perhaps best seen as finding motifs with some tolerance for a slightly (~25%) too short user-specified query length, rather than a true "motif-of-all-lengths" algorithm. Also note that apart from the shortest length, VALMOD only gives some information for the other lengths, unlike pmp, which contains exact distances for all subsequences of all lengths.

When just the data is provided, the exploration will be done using the default window_sizes that is a sequence of 20 values between 10 and the half data size and the resulting object will have an upper_bound equals to Inf. If an object is provided by the argument pmp_obj, this function will add more information to the resulting object, never changing the values already computed. verbose changes how much information is printed by this function; 0 means nothing, 1 means text, 2 adds the progress bar, 3 adds the finish sound.

Talk about upper bound and window sizes

  1. upper_window will be set to Inf on new objects 1.1. upper_window will also be used for plot, and for discovery, it must not remove any existing data from the object

  2. window_sizes is used for plot, it must not remove any mp inside the object 2.1. window_sizes tells the function what mp are stored, it may be updated with as.numeric(names(pmp))

  3. the functions must be capable to handle the data without need to sort by window_size, but sort may be useful later(?)

Value

Returns a PMP object.

Examples


# Just compute
pan <- pmp(mp_gait_data)
# Compute the upper bound, than add new profiles
pan <- pmp_upper_bound(mp_gait_data)
pan <- pmp(mp_gait_data, pmp_obj = pan)


tsmp documentation built on Aug. 21, 2022, 1:13 a.m.