q_gg_marg_DZ: Marginal Generalized Gamma Method for Outlier Detection

Description Usage Arguments Value Examples

View source: R/Main_Functions.R

Description

This function implements the marginal generalized gamma method for outlier detection among replicated data. It first fits each replicate (X_1 and X_2) to generalized gamma distributions (using the parameterization in the R package flexsurv, given by Kotz and Johnson (1970)) using MLE. It also fits the aboslute difference Delta (D = X_1 - X_2) between the two replicates to an Asymmetric Laplace Distribution using MLE. It then determines whether Delta's Laplace Distribution is Asymmetric or Symmetric and whether it has a significant displacement parameter. Then among the points outside of some central band (defined using the Laplace parameters fitted to Delta), we use the generalized gamma parameters fitted to the entire X_1 and X_2 vectors (see the paper in the citation) to determine the marginal probability that Z will take a value greater than its observed value. We use numerical integration (specifically the function adaptIntegrate in the package cubature) to integrate the marginal PDF for Z to get this probability. We assign the probability 1 to points in the middle band.

Usage

1
2
q_gg_marg_DZ(X_1, X_2, p_theta = 0.05, p_kappa = 0.05, k = 1,
  n_cores = detectCores() - 1)

Arguments

X_1

The first (independent) replicate of the data. A vector of positive real numbers

X_2

The second (independent) replicate of the data. A vector of positive real numbers

p_theta

We use the (1-p_theta)*100% two-sided confidence interval for theta in Delta = X_1 - X_2 + theta to determine if there is a significant translation of the absolute difference Delta. If this interval contains 0, then we set theta = 0. We set p_theta = 0.05 by default

p_kappa

We use the (1-p_kappa)*100% two-sided confidence interval for the asymmetry parameter kappa in the Asymmetric Laplace Distribution to which we fit Delta. If this interval for log(kappa) contains 0, then we set kappa = 0 and use a Symmetric Laplace Distribution for Delta. We set p_kappa = 0.05 by default

k

The number of standard deviations about the center (mean) of the Asymmetric Laplace Distribution for Delta that we use to define the "central band." We set k = 1 by default

n_cores

This function works by numerically integrating the joint PDF for each data point. To speed up this process, we run this process in parallel (using the package parallel), which requires specifying the number of cores (n_cores) on the computer to use. By default, we use all but one core on the machine (with the remaining one free for other functions).

Value

A numerical vector of equal length to the input X_1 and X_2 vectors. Using D = X_1 - X_2, Z = sqrt(2) * abs(X_1 - X_2) / (X_1 + X_2), and (d,z) for each (X_1,X_2) data point, we get the marginal probability q = P(z <= Z <= sqrt(2)) if (d,z) is not in the middle band and the assigned value 1 if it is in the middle band

Examples

1
2
3
4
5
# Assume X_1 and X_2 are positive data vectors of the same length. These are the replicates
data(Sim_GG)
df <- data.frame(X_1=Sim_GG$X_1, X_2=Sim_GG$X_2)
# The function q_gg_marg_DZ calculates D and Z for us
# df$q_gg_m <- q_gg_marg_DZ(df$X_1, df$X_2) #Only run this on a cluster!

matthew-seth-smith/replicateOutliers documentation built on Jan. 24, 2020, 9:34 p.m.