| mean_dscore | R Documentation |
This function calculates the mean divergence score for one or more variables grouped by an event identifier. The divergence score captures how often values for a given variable differ across event reports describing the same event.
mean_dscore(data, group_var, variables, normalize = FALSE, plot = FALSE)
data |
A data frame containing event report level data. |
group_var |
A character string naming the column that uniquely identifies events (e.g., "event_id"). |
variables |
A character vector of column names to compute divergence scores for. |
normalize |
Logical, indicating whether to normalize the scores by the total number of unique values for each variable. |
plot |
Logical, indicating whether to return a ggplot object visualizing the scores. |
For each variable and event, the function computes the number of unique values reported, subtracts one, and averages these values across all events. This reflects how much inconsistency exists across sources. Optionally, the scores can be normalized by the total number of unique values observed for each variable across the dataset. The result is a long-format dataframe showing which variables are most sensitive to aggregation. A plotting option is also available.
Either a tibble or a ggplot object, depending on the value of plot.
If plot = FALSE, returns a tibble with two columns:
The name of each variable.
The mean divergence score or normalized score.
If plot = TRUE, returns a lollipop-style plot showing divergence scores by variable.
df <- data.frame(
event_id = c(1, 1, 2, 2, 3),
country = c("US", "US", "UK", "UK", "CA"),
actor1 = c("Actor A", "Actor B", "Actor B", "Actor C", "Actor D"),
deaths_best = c(10, 20, 5, 15, 10)
)
mean_dscore(df, "event_id", c("country", "actor1", "deaths_best"), normalize = TRUE, plot = TRUE)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.