sim_jaccard_emd_2: Simulate random data removal via alternative Earth Mover's...

View source: R/jaccard.R

sim_jaccard_emd_2R Documentation

Simulate random data removal via alternative Earth Mover's Distance Cognate Cluster cluster assignment approach

Description

Simulate random data removal for a removal amount with indicated number of simulations from time series data list and determine Jaccard index for all clusters via Earth Mover's distance cognate cluster assignment approach.

Usage

sim_jaccard_emd_2(
  plist,
  parameter,
  removal,
  n_simu,
  method,
  n_clust,
  maxIter,
  normalize
)

Arguments

plist

Object of type list storing patient time series data (also see function: patient_list)

parameter

Parameter of interest in time series data list

removal

Amount of random data removal to determine Jaccard index

n_simu

Number of simulations

method

Clustering method (also see function: clust_matrix)

n_clust

Number of clusters (also see function: clust_matrix)

maxIter

Maximum iterations to determine Earth Mover's Distances (also see function: emd_matrix); default is 5,000 for this function

normalize

Indicates if parameter indicated needs to be normalized or not (TRUE by default)

Details

This method represents a novel approach and potential complementary method to sim_jaccard_global and alternative to sim_jaccard_emd. First, clustering is performed on complete data without removal, serving as Gold Standard clusters. Subsequently, random data is removed form the time series data. Each leaky data distribution is then compared via Earth Mover's Distance to each member's distribution of each Gold Standard cluster The Gold Standard cluster to which the observed leaky distribution exhibits the lowest avergae Earth Mover's Distance gets the assignment. This process is repeated until every leaky time series data distribution is assigned to a cluster. Afterwards, the Jaccard indices are calculated, comparing cluster members with complete and leaky data, for each cluster.

Value

Object of type matrix storing received Jaccard indices for indicated amount of random data removal for all clusters

References

Yossi Rubner, Carlo Tomasi, and Leonidas J Guibas. A metric for distributions with applications to image databases. In Sixth International Conference on Computer Vision (IEEE Cat. No. 98CH36271), pages 59–66. IEEE, 1998.

Examples

list <- patient_list(
"https://raw.githubusercontent.com/MrMaximumMax/FBCanalysis/master/demo/phys/data.csv",
GitHub = TRUE)
output <- sim_jaccard_emd_2(list, "PEF", 0.05, 10, "hierarchical", 2, 100)


MrMaximumMax/FBCanalysis documentation built on June 23, 2022, 8:21 p.m.