Description Usage Arguments Value
If it can be assumed that matches should only occur within a given time range (e.g., event data should match news items after the event occured) a low effort validation can be obtained by looking at whether the matches only occur within this time range. This function plots the percentage of matches within a given time range (hourdiff) for different thresholds of the weight column. This can be used to determine a good threshold.s
1 2 3 | time_based_validity(g, total_hourdiff, expected_hourdiff,
min_weight = NA, lambda = log(2)/24, breaks = 100,
hist_breaks = NA)
|
g |
The edgelist output of newsflow.compare (use the argument: return_as = "edgelist"). Has to come directly from newsflow.compare (i.e. no intermediate operations performed such as subsetting), because the current function requires certain attributes that are removed if g is changed. Also, the margin_attr argument in newsflow.compare has to be TRUE (as is the default) |
total_hourdiff |
The range of the hourdiff value in g. This should be the same as the hour.window in newsflow.compare (if g has not been subsetted afterwards). |
expected_hourdiff |
A vector of length 2, that indicates the range (including endpoints) in which you expect matches to occur based on reasonable assumptions about the data. For matching events to news articles, a very reasonable assumption is that we expect matches to occur 'after' the event took place, and a reasonable second assumption is that we expect matches to occur 'within a limited amount of time' after the event. |
min_weight |
Filter out all matches below the given weight |
breaks |
The number of breaks for the weight threshold |
hist_breaks |
the number of breaks on the histogram |
A plot, and the plot data can be assigned
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.