hourdiff_range_thresholds: Inspect effects of thresholds on matches over time

View source: R/util.r

hourdiff_range_thresholdsR Documentation

Inspect effects of thresholds on matches over time

Description

If it can be assumed that matches should only occur within a given time range (e.g., event data should match news items after the event occured) a low effort validation can be obtained by looking at whether the matches only occur within this time range. This function plots the percentage of matches within a given time range (hourdiff) for different thresholds of the weight column. This can be used to determine a good threshold.

Usage

hourdiff_range_thresholds(
  g,
  breaks = 20,
  hourdiff_range = c(0, Inf),
  min_weight = NA,
  min_hourdiff = NA,
  max_hourdiff = NA
)

Arguments

g

The output of newsflow.compare (either as "igraph" or "edgelist")

breaks

The number of breaks for the weight threshold

hourdiff_range

The time period (hourdiff range) in which the match 'should' occur.

min_weight

Optionally, filter out all value below the given weight

min_hourdiff

the lowest possible hourdiff value. This is used to estimate noise. If not specified, will be estimated based on data.

max_hourdiff

the highest possible hourdiff value.

Value

Nothing... just plots


kasperwelbers/RNewsflow documentation built on April 8, 2024, 4:39 p.m.