seq_transform: Find and classify outliers functional outliers using...

View source: R/sequential_transformations.R

seq_transformR Documentation

Find and classify outliers functional outliers using Sequential Transformation

Description

This method finds and classify outliers using sequential transformations proposed in Algorithm 1 of Dai et al. (2020) \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1016/j.csda.2020.106960")}. A sequence of transformations are applied to the functional data and after each transformation, a functional boxplot is applied on the transformed data and outliers flagged by the functional data are noted. A number of transformations mentioned in Dai et al. (2020) \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1016/j.csda.2020.106960")} are supported including vertical alignment ("T1(X)(t)"), normalization ("T2(X)(t)"), one order of differencing ("D1(X)(t)" and "D2(X)(t)") and point-wise outlyingness data ("O(X)(t)"). The feature alignment transformation based on warping/curve registration is not yet supported.

Usage

seq_transform(
  dts,
  sequence = c("T0", "T1", "T2"),
  depth_method = c("mbd", "tvd", "extremal", "dirout", "linfinity", "bd", "erld", "dq"),
  save_data = FALSE,
  emp_factor = 1.5,
  central_region = 0.5,
  erld_type = NULL,
  dq_quantiles = NULL,
  n_projections = 200L,
  seed = NULL
)

Arguments

dts

A matrix for univariate functional data (of size n observations by p domain points) or a 3-dimensional array for multivariate functional data (of size n observations by p domain points by d dimension). Only the outlyingness transformation ("O(X)(t)") supports multivariate functional data so the sequence of transformation must always start with outlyingness ("O(X)(t)") whenever a multivariate functional data is parsed to dts.

sequence

A character vector usually of length between 1 and 6 containing any of the strings: "T0", "D0", "T1", "T2", "D1", "D2" and "O" (in any order). These sequence of strings specifies the sequence of transformations to be applied on the data and their meanings are described as follows:

"T0" and "D0"

Functional boxplot applied on raw data (no transformation is applied).

"T1"

Apply vertical alignment on data, i.e. subtract from each curve its expectation over the domain of evaluation.

"T2"

Apply normalization on data, i.e. divide each curve by its L-2 norm.

"D1" and "D2"

Apply one order of differencing on data.

"O"

Find the pointwise outlyingness of data. For multivariate functional data, this transformation replaces the multivariate functional data with a univariate functional data of pointwise outlyingness.

Examples of sequences of transformations include: "T0", c("T0", "T1", "D1"), c("T0", "T1", "T2"), c("T0", "D1", "D2") and c("T0", "T1", "T2", "D1", "D2"). See Details for their meaning.

depth_method

A character value specifying depth/outlyingness method to use in the functional boxplot applied after each stage of transformation. Note that the same depth/outlyingness method is used in the functional boxplot applied after each transformation in the sequence. The following methods are currently supported:

"mbd":

The modified band depth with bands defined by 2 functions. Uses the algorithm of Sun et al. (2012).

"tvd"

The total variation depth of Huang and Sun (2019).

"extremal"

The extremal depth of Narisetty and Nair (2016).

"dirout"

Uses the robust distance of the mean and variation of directional outlyingness (dir_out) defined in Dai and Genton (2018). Since this method is a measure of outlyingness of a function the negative of the computed robust distance is used in ordering the functions.

"linfinity"

The L-infinity depth defined in Long and Huang (2015) is used in ordering functions.

"bd"

Uses the band depth with bands defined by 2 functions according to the algorithm of Sun et al. (2012)

erld

Uses the extreme rank length depth defined in Myllymäki et al. (2017) and mentioned in Dai et al. (2020).

"dq"

Uses the directional quantile (DQ) defined in Myllymäki et al. (2017) and mentioned in Dai et al. (2020). Since DQ is a measure of outlyingness, the negative of the DQ values is used in ordering the functions.

save_data

A logical. If TRUE, the intermediate transformed data are returned in a list.

emp_factor

The empirical factor for functional boxplot. Defaults to 1.5.

central_region

A value between 0 and 1 indicating the central region probability for functional_boxplot. Defaults to 0.5.

erld_type

If depth_method = "erld", the type of ordering to use in computing the extreme rank length depth (ERLD). Can be one of "two_sided", "one_sided_left" or "one_sided_right". A "two_sided" ordering is used by default if erld_type is not specified and depth_method = "erld". The "one_sided_right" ERLD is especially useful for ordering functions of outlyingness (the output of the "O" transformation) since it considers only large values as extreme. See extreme_rank_length for details.

dq_quantiles

If depth_method = "dq", a numeric vector of length 2 specifying the probabilities of upper and lower quantiles. Defaults to c(0.025, 0.975) for the upper and lower 2.5% quantiles. See directional_quantile for details.

n_projections

An integer indicating the number of random projections to use in computing the point-wise outlyingness if a 3-d array is specified in dts i.e. (multivariate functional data), and the transformation "O" is part of the sequence of transformations parsed to sequence. Defaults to 200L.

seed

The random seed to set when generating the random directions in the computation of the point-wise outlyingness. Defaults to NULL. in which case a seed is not set.

Details

This function implements outlier detection using sequential transformations described in Algorithm 1 of Dai et al. (2020) \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1016/j.csda.2020.106960")}. A sequence of transformations are applied consecutively with the functional boxplot applied on the transformed data after each transformation. The following example sequences (and their meaning) suggested in Dai et al. (2020) \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1016/j.csda.2020.106960")} can be parsed to argument sequence.

"T0"

Apply functional boxplot on raw data (no transformation is applied).

c("T0", "T1", "D1")

Apply functional boxplot on raw data, then apply vertical alignment on data followed by applying functional boxplot again. Finally apply one order of differencing on the vertically aligned data and apply functional boxplot again.

c("T0", "T1", "T2")

Apply functional boxplot on raw data, then apply vertical alignment on data followed by applying functional boxplot again. Finally apply normalization using L-2 norm on the vertically aligned data and apply functional boxplot again.

c("T0", "D1", "D2")

Apply functional boxplot on raw data, then apply one order of difference on data followed by applying functional boxplot again. Finally apply another one order of differencing on the differenced data and apply functional boxplot again. Note that this sequence of transformation can also be (alternatively) specified by c("T0", "D1", "D1"), c("T0", "D2", "D2"), and c("T0", "D2", "D1") since "D1" and "D2" do the same thing which is to apply one order lag-1 difference on the data.

"O"

Find the pointwise outlyingness of the multivariate or univariate functional data and then apply functional boxplot on the resulting univariate functional data of pointwise outlyingness. Care must be taken to specify a one sided ordering function (i.e. "one_sided_right" extreme rank length depth) in the functional boxplot used on the data of point-wise outlyingness. This is because only large values should be considered extreme in the data of the point-wise outlyingness.

For multivariate functional data (when a 3-d array is supplied to dts), the sequence of transformation must always begin with "O" so that the multivariate data can be replaced with the univariate data of point-wise outlyingness which the functional boxplot can subsequently process because the functional_boxplot function only supports univariate functional data.

If repeated transformations are used in the sequence (e.g. when sequence = c("T0", "D1", "D1")), a warning message is thrown and the labels of the output list are changed (e.g. for sequence = c("T0", "D1", "D1"), the labels of the output lists become "T0", "D1_1", "D1_2", so that outliers are accessed with output$outlier$D1_1 and output$outlier$D1_2). See examples for more.

Value

A list containing two lists are returned. The contents of the returned list are:

outliers:

A named list of length length(sequence) containing the index of outliers found after each transformation. The names of the elements of this list are the sequence strings supplied to sequence and the outliers found after each stage of transformation are not necessarily mutually exclusive.

transformed_data

If save_data = TRUE a named list of length length(sequence) containing the transformed matrix after each transformation. The names of the elements of this list are the sequence strings supplied to sequence. NULL otherwise (if save_data = FALSE).

Examples

# same as running a functional boxplot
dt1 <- simulation_model1()
seqobj <- seq_transform(dt1$data, sequence = "T0", depth_method = "mbd")
seqobj$outliers$T0
functional_boxplot(dt1$data, depth_method = "mbd")$outliers

# more sequences
dt4 <- simulation_model4()
seqobj <- seq_transform(dt4$data, sequence = c("T0", "D1", "D2"), depth_method = "mbd")
seqobj$outliers$T0 # outliers found in raw data
seqobj$outliers$D1 # outliers found after differencing data the first time
seqobj$outliers$D2 # outliers found after differencing the data the second time

# saving transformed data
seqobj <- seq_transform(dt4$data, sequence = c("T0", "D1", "D2"),
 depth_method = "mbd", save_data = TRUE)
seqobj$outliers$T0 # outliers found in raw data
head(seqobj$transformed_data$T0)  # the raw data
head(seqobj$transformed_data$D1) # the first order differenced data
head(seqobj$transformed_data$D2) # the 2nd order differenced data

# double transforms e.g. c("T0", "D1", "D1")
seqobj <- seq_transform(dt4$data, sequence = c("T0", "D1", "D1"),
 depth_method = "mbd", save_data = TRUE) # throws warning
seqobj$outliers$T0 # outliers found in raw data
seqobj$outliers$D1_1 #found after differencing data the first time
seqobj$outliers$D1_2 #found after differencing data the second time
head(seqobj$transformed_data$T0)  # the raw data
head(seqobj$transformed_data$D1_1) # the first order differenced data
head(seqobj$transformed_data$D1_2) # the 2nd order differenced data

# multivariate data
dtm <- array(0, dim = c(dim(dt1$data), 2))
dtm[,,1] <- dt1$data
dtm[,,2] <- dt1$data
seqobj <- seq_transform(dtm, sequence = "O", depth_method = "erld",
 erld_type = "one_sided_right", save_data = TRUE)
seqobj$outliers$O # multivariate outliers
head(seqobj$transformed_data$O) # univariate outlyingness data


fdaoutlier documentation built on Oct. 1, 2023, 1:06 a.m.