add_patterns: A function to add patterns to a verisr dataframe

View source: R/patterns2021.R

add_patternsR Documentation

A function to add patterns to a verisr dataframe

Description

This function works by scoring the incidents according to the skmeans clusters. Not, it can be rather slow on large data sets.

Usage

add_patterns(
  veris,
  centroids = NULL,
  prefix = "pattern",
  replace = TRUE,
  clusters = FALSE,
  threshold = 0.1,
  veris_update_f = verisr::pattern_current_to_1.3.5
)

Arguments

veris

A verisr data.table or data.frame like veris object

centroids

A matrix of of skmeans centroids with one row per centroid. If null, (the default), the 2021 DBIR pattern centroids will be used.

prefix

The predicate of the column name to use for the patterns

replace

Whether to remove previously existing columns with the same predicate before adding the patterns

clusters

If TRUE, will add the clusters to the returned veris object as 'cluster.X' with a value of the cosign distance to the cluster

threshold

The ratio of the difference of cluster-to-incident distances and the smallest cluster-to-incident distance. Defaults to 1/10th (i.e. the difference must be 1/10th the distance to the incident. This results in two percent of clusters kept in 2020 data)

veris_update_f

A function to apply to centoids and veris to handle updates to veris after the clusters are defined. It must take a veris object and a centroid and return a list of a veris object and centroid. Because veris adds, removes, and changes enumerations each year, this function modifies the data and centroids, (currently based on veris 1.3.5) to be compatible with the current version of VERIS.

Value

veris object with the columns added


vz-risk/verisr documentation built on Aug. 5, 2023, 4:34 a.m.