outliergram: Outliergram for univariate functional data sets

Description Usage Arguments Value Adjustment References See Also Examples

View source: R/outliergram.R

Description

This function performs the outliergram of a univariate functional data set, possibly with an adjustment of the true positive rate of outliers discovered under assumption of gaussianity.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
outliergram(
  fData,
  MBD_data = NULL,
  MEI_data = NULL,
  p_check = 0.05,
  Fvalue = 1.5,
  adjust = FALSE,
  display = TRUE,
  xlab = NULL,
  ylab = NULL,
  main = NULL,
  ...
)

Arguments

fData

the univariate functional dataset whose outliergram has to be determined.

MBD_data

a vector containing the MBD for each element of the dataset. If missing, MBDs are computed.

MEI_data

a vector containing the MEI for each element of the dataset. If not not provided, MEIs are computed.

p_check

percentage of observations with either low or high MEI to be checked for outliers in the secondary step (shift towards the center of the dataset).

Fvalue

the F value to be used in the procedure that finds the shape outliers by looking at the lower parabolic limit in the outliergram. Default is 1.5. You can also leave the default value and, by providing the parameter adjust, specify that you want Fvalue to be adjusted for the dataset provided in fData.

adjust

either FALSE if you would like the default value for the inflation factor, F = 1.5, to be used, or a list specifying the parameters required by the adjustment.

  • "N_trials": the number of repetitions of the adjustment procedure based on the simulation of a gaussian population of functional data, each one producing an adjusted value of F, which will lead to the averaged adjusted value \bar{F}. Default is 20;

  • "trial_size": the number of elements in the gaussian population of functional data that will be simulated at each repetition of the adjustment procedure. Default is 5 * fData$N;

  • "TPR": the True Positive Rate of outliers, i.e. the proportion of observations in a dataset without shape outliers that have to be considered outliers. Default is 2 * pnorm( 4 * qnorm( 0.25 ) );

  • "F_min": the minimum value of F, defining the left boundary for the optimization problem aimed at finding, for a given dataset of simulated gaussian data associated to fData, the optimal value of F. Default is 0.5;

  • "F_max": the maximum value of F, defining the right boundary for the optimization problem aimed at finding, for a given dataset of simulated gaussian data associated to fData, the optimal value of F. Default is 20;

  • "tol": the tolerance to be used in the optimization problem aimed at finding, for a given dataset of simulated gaussian data associated to fData, the optimal value of F. Default is 1e-3;

  • "maxiter": the maximum number of iterations to solve the optimization problem aimed at finding, for a given dataset of simulated gaussian data associated to fData, the optimal value of F. Default is 100;

  • "VERBOSE": a parameter controlling the verbosity of the adjustment process;

display

either a logical value indicating whether you want the outliergram to be displayed, or the number of the graphical device where you want the outliergram to be displayed.

xlab

a list of two labels to use on the x axis when displaying the functional dataset and the outliergram

ylab

a list of two labels to use on the y axis when displaying the functional dataset and the outliergram;

main

a list of two titles to be used on the plot of the functional dataset and the outliergram;

...

additional graphical parameters to be used only in the plot of the functional dataset

Value

Even when used graphically to plot the outliergram, the function returns a list containing:

Adjustment

When the adjustment option is selected, the value of F is optimized for the univariate functional dataset provided with fData. In practice, a number adjust$N_trials of times a synthetic population (of size adjust$trial_size with the same covariance (robustly estimated from data) and centerline as fData is simulated without outliers and each time an optimized value F_i is computed so that a given proportion (adjust$TPR) of observations is flagged as outliers. The final value of F for the outliergram is determined as an average of F_1, F_2, …, F_{N_{trials}}. At each time step the optimization problem is solved using stats::uniroot (Brent's method).

References

Arribas-Gil, A., and Romo, J. (2014). Shape outlier detection and visualization for functional data: the outliergram, Biostatistics, 15(4), 603-619.

See Also

fData, MEI, MBD, fbplot

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
set.seed(1618)

N <- 200
P <- 200
N_extra <- 4

grid <- seq(0, 1, length.out = P)

Cov <- exp_cov_function(grid, alpha = 0.2, beta = 0.8)

Data <- generate_gauss_fdata(
  N = N,
  centerline = sin(4 * pi * grid),
  Cov = Cov
)

Data_extra <- array(0, dim = c(N_extra, P))

Data_extra[1, ] <- generate_gauss_fdata(
  N = 1,
  centerline = sin(4 * pi * grid + pi / 2),
  Cov = Cov
)

Data_extra[2, ] <- generate_gauss_fdata(
  N = 1,
  centerline = sin(4 * pi * grid - pi / 2),
  Cov = Cov
)

Data_extra[3, ] <- generate_gauss_fdata(
  N = 1,
  centerline = sin(4 * pi * grid + pi / 3),
  Cov = Cov
)

Data_extra[4, ] <- generate_gauss_fdata(
  N = 1,
  centerline = sin(4 * pi * grid - pi / 3),
  Cov = Cov
)

Data <- rbind(Data, Data_extra)

fD <- fData(grid, Data)

# Outliergram with default Fvalue = 1.5
outliergram(fD, display = TRUE)

# Outliergram with Fvalue enforced to 2.5
outliergram(fD, Fvalue = 2.5, display = TRUE)


# Outliergram with estimated Fvalue to ensure TPR of 1%
outliergram(
  fData = fD,
  adjust = list(
    N_trials = 10,
    trial_size = 5 * nrow(Data),
    TPR = 0.01,
    VERBOSE = FALSE
  ),
  display = TRUE
)

ntarabelloni/roahd documentation built on Feb. 10, 2022, 1:41 a.m.