merge_duplicate_alerts: Merge rows with duplicate alerts

Description Usage Arguments Value Author(s) Examples

View source: R/merge_duplicate_alerts.R

Description

Each record in ProMED and HealthMap data feeds is (in principle) associated with a unique alert-id. Occasionally, we get multiple rows that have the same alert-id. In such instances, we want to merge these rows into a single row in a meaningful way. For the meta-data associated with the records e.g., the URL, it would be useful to retain all of them, especially if these columns are not being used in the analysis downstream. For others, e.g., the longitude and latitude, we expect them to be the same across the set of records, but if they are not, we want to retain one of them. Finally, for numeric columns (particularly cases) we want a summary statistic like median or mean. This function merges the records with the user picking which columns should be merged in which way i.e., whether all values or only one of them should be retained. The only exception is the column called cases, which is always summarised using a mathematical function specified by the arg rule.

Usage

1
2
3
4
5
6
7
8
merge_duplicate_alerts(
  df,
  keep_all,
  keep_first,
  use_rule = c("cases"),
  rule = stats::median,
  sep = " / "
)

Arguments

df

data frame containing duplicate alerts. Must contain a column called cases. All columns except cases will be merged by collpasing their content into a single string for each column. The column cases will be merged accrding to the rule argument. E.g., median will return the median of cases.

keep_all

character vector. Names of columns for which values in all rows should be retained.

keep_first

character vector. Names of columns for which values for which only the first value should be retained.

use_rule

columns that should be summarised using rule. These should all be numeric.

rule

any valid R function that accepts a numeric vector and returns a number. Defaults to median

sep

separator used to paste multiple values from a column

Value

data.frame with a single row

Author(s)

Sangeeta Bhatia

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
## Made-up data
made_up <- data.frame(
   country = rep("singapore", 3),
   cases = c(3, 7, 9),
   alert_id = rep(letters[1], 3),
   longitude = c(103.8, 103.8, 103.8),
   latitude = c(1.4, 1.5, 1.4)
)
##Alert-ids in this data.frame are duplicated. Merging the rows then
merged <-  merge_duplicate_alerts(
  made_up,
  keep_all = c("country", "alert_id"),
  keep_first = c("longitude", "latitude"))

sangeetabhatia03/promedr documentation built on March 12, 2020, 7:25 a.m.