validate_get_twcv: Check if color data are valid and get TWCV

Description Usage Arguments Value Details See Also

View source: R/validate_get_twcv.R

Description

Checks if passed color data are valid, i. e. are bountiful and varied enough according to passed validation criteria. This function is normally only used indirectly through 'Participant$check_valid_get_twcv()' or 'ParticipantGroup$get_valid_twcv()'.

Usage

1
2
3
4
5
6
7
8
9
validate_get_twcv(
  color_matrix,
  dbscan_eps = 20,
  dbscan_min_pts = 4,
  max_var_tight_cluster = 150,
  max_prop_single_tight_cluster = 0.6,
  safe_num_clusters = 3,
  safe_twcv = 250
)

Arguments

color_matrix

An n-by-3 numerical matrix where each row corresponds to a single point in 3D color space.

dbscan_eps

One-element numerical vector: radius of ‘epsilon neighborhood’ when applying DBSCAN clustering.

dbscan_min_pts

One-element numerical vector: Minimum number of points required in the epsilon neighborhood for core points (including the core point itself).

max_var_tight_cluster

One-element numerical vector: maximum variance for a cluster to be considered 'tight-knit'.

max_prop_single_tight_cluster

One-element numerical vector: maximum proportion of points allowed to be within a 'tight-knit' cluster (if this threshold is exceeded, the data are categorized as invalid).

safe_num_clusters

One-element numerical vector: minimum number of clusters that guarantees validity if points are 'non-tight-knit'.

safe_twcv

One-element numerical vector: minimum total within-cluster variance (TWCV) score that guarantees validity if points are 'non-tight-knit'.

Value

A list with components

valid

One-element logical vector

reason_invalid

One-element character vector, empty if valid is TRUE

twcv

One-element numeric (or NA if can't be calculated) vector, indicating TWCV

num_clusters

One-element numeric (or NA if can't be calculated) vector, indicating the number of identified clusters counting toward the tally compared with 'safe_num_clusters'

Details

This function relies heavily on the DBSCAN algorithm and its implementation in the R package 'dbscan', for clustering color points. For further information regarding the 'dbscan_eps' and 'dbscan_min_pts' parameters as well as DBSCAN itself, please see the 'dbscan' documentation. Once clustering is done, passed validation criteria are applied:

Note that this means data can be classified as valid by either having at least 'safe_num_cluster' clusters, or by having points composing a smaller number of clusters but spaced relatively far apart within these clusters.

The DBSCAN 'noise' cluster only counts towards the 'cluster tally' (compared with 'safe_num_cluster') if it includes at least 'dbscan_min_pts' points. Points in the noise cluster are however always included in other calculations, e. g. total within-cluster variance (TWCV).

See Also

point_3d_variance for single-cluster variance, total_within_cluster_variance for TWCV.


synr documentation built on Nov. 25, 2021, 1:06 a.m.