Description Usage Arguments Value Author(s) Examples
View source: R/merge_duplicate_alerts.R
Each record in ProMED and HealthMap data feeds is (in principle) associated with a unique alert-id. Occasionally, we get multiple rows that have the same alert-id. In such instances, we want to merge these rows into a single row in a meaningful way. For the meta-data associated with the records e.g., the URL, it would be useful to retain all of them, especially if these columns are not being used in the analysis downstream. For others, e.g., the longitude and latitude, we expect them to be the same across the set of records, but if they are not, we want to retain one of them. Finally, for numeric columns (particularly cases) we want a summary statistic like median or mean. This function merges the records with the user picking which columns should be merged in which way i.e., whether all values or only one of them should be retained. The only exception is the column called cases, which is always summarised using a mathematical function specified by the arg rule.
1 2 3 4 5 6 7 8 | merge_duplicate_alerts(
df,
keep_all,
keep_first,
use_rule = c("cases"),
rule = stats::median,
sep = " / "
)
|
df |
data frame containing duplicate alerts. Must contain a column called cases. All columns except cases will be merged by collpasing their content into a single string for each column. The column cases will be merged accrding to the rule argument. E.g., median will return the median of cases. |
keep_all |
character vector. Names of columns for which values in all rows should be retained. |
keep_first |
character vector. Names of columns for which values for which only the first value should be retained. |
use_rule |
columns that should be summarised using rule. These should all be numeric. |
rule |
any valid R function that accepts a numeric vector and returns a number. Defaults to median |
sep |
separator used to paste multiple values from a column |
data.frame with a single row
Sangeeta Bhatia
1 2 3 4 5 6 7 8 9 10 11 12 13 | ## Made-up data
made_up <- data.frame(
country = rep("singapore", 3),
cases = c(3, 7, 9),
alert_id = rep(letters[1], 3),
longitude = c(103.8, 103.8, 103.8),
latitude = c(1.4, 1.5, 1.4)
)
##Alert-ids in this data.frame are duplicated. Merging the rows then
merged <- merge_duplicate_alerts(
made_up,
keep_all = c("country", "alert_id"),
keep_first = c("longitude", "latitude"))
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.