q_gg_joint_DZ: Joint Generalized Gamma Method for Outlier Detection

Description Usage Arguments Value Examples

View source: R/Main_Functions.R

Description

This function implements the joint generalized gamma method for outlier detection among replicated data. It first fits each replicate (X_1 and X_2) to generalized gamma distribution (using the parameterization in the R package flexsurv, given by Kotz and Johnson (1970)) using MLE. We then use those parameters and the joint probability density function derived in the paper in the citation for this package to calculate the joint probability that D and Z take values more extreme than their observed values for each data point (d and z, respectively). That is, we use numerical integration (using the function adaptIntegrate in the package cubature) the joint PDF to find P(D <= d, z <= Z <= sqrt(2)) if d < 0 or P(D >= d, z <= Z <= sqrt(2)) else. We already know 0 <= Z <= sqrt(2).

Usage

1
q_gg_joint_DZ(X_1, X_2, n_cores = detectCores() - 1)

Arguments

X_1

The first (independent) replicate of the data. A vector of positive real numbers

X_2

The second (independent) replicate of the data. A vector of positive real numbers

n_cores

This function works by numerically integrating the joint PDF for each data point. To speed up this process, we run this process in parallel (using the package parallel), which requires specifying the number of cores (n_cores) on the computer to use. By default, we use all but one core on the machine (with the remaining one free for other functions).

Value

A numerical vector of equal length to the input X_1 and X_2 vectors. Using D = X_1 - X_2, Z = sqrt(2) * abs(X_1 - X_2) / (X_1 + X_2), and (d,z) for each (X_1,X_2) data point, we get the outlier probability q = P((D <= d, z <= Z <= sqrt(2)) | d < theta OR (D >= d, z <= Z <= sqrt(2)) | d >= theta) for each (d,z)

Examples

1
2
3
4
5
# Assume X_1 and X_2 are positive data vectors of the same length. These are the replicates
data(Sim_GG)
df <- data.frame(X_1=Sim_GG$X_1, X_2=Sim_GG$X_2)
# The function q_gg_joint_DZ calculates D and Z for us
# df$q_gg_j <- q_gg_joint_DZ(df$X_1, df$X_2) #Only run this on a cluster!

matthew-seth-smith/replicateOutliers documentation built on Jan. 24, 2020, 9:34 p.m.