early_warning_cluster: Early Warning for Disease Clusters

View source: R/early_warning.R

early_warning_clusterR Documentation

Early Warning for Disease Clusters

Description

Detect disease clusters with early_warning_cluster(). Use has_clusters() to return TRUE or FALSE based on its output, or employ format() to format the result.

Usage

early_warning_cluster(
  df,
  column_date = NULL,
  column_patientid = NULL,
  based_on_historic_maximum = FALSE,
  period_length_months = 12,
  minimum_cases = 5,
  minimum_days = 0,
  minimum_case_days = 2,
  minimum_case_fraction_in_period = 0.02,
  threshold_percentile = 97.5,
  remove_outliers = TRUE,
  remove_outliers_coefficient = 1.5,
  moving_average_days = 7,
  moving_average_side = "left",
  case_free_days = 14,
  ...
)

n_clusters(x)

has_clusters(x, n = 1)

has_ongoing_cluster(x, dates = Sys.Date() - 1)

has_cluster_before(x, date)

has_cluster_after(x, date)

Arguments

df

Data set: This must consist of only positive results. The minimal data set should include a date column and a patient column. Do not summarize on patient IDs; this will be handled automatically.

column_date

Name of the column to use for dates. If left blank, the first date column will be used.

column_patientid

Name of the column to use for patient IDs. If left blank, the first column resembling "patient|patid" will be used.

based_on_historic_maximum

A logical to indicate whether the cluster detection should be based on the maximum of previous years. The default is FALSE, which uses all historic data points.

period_length_months

Number of months per period.

minimum_cases

Minimum number of cases that a cluster requires to be considered a cluster.

minimum_days

Minimum number of days that a cluster requires to be considered a cluster.

minimum_case_days

Minimum number of days with cases that a cluster requires to be considered a cluster.

minimum_case_fraction_in_period

Minimum fraction of cluster cases in a period that a cluster requires to be considered a cluster.

threshold_percentile

Threshold to set.

remove_outliers

A logical to indicate whether outliers should be removed before determining the threshold.

remove_outliers_coefficient

Coefficient used for outlier determination.

moving_average_days

Number of days to set in moving_average(). Defaults to a whole week (7).

moving_average_side

Side of days to set in moving_average(). Defaults to "left" for retrospective analysis.

case_free_days

Number of days to set in get_episode().

...

not used at the moment

x

output of early_warning_cluster()

n

number of clusters, defaults to 1

dates

date(s) to test whether any of the clusters currently has this date in it, defaults to yesterday.

date

date to test whether there are any clusters since or until this date.

Details

A (disease) cluster is defined as an unusually large aggregation of disease events in time or space (ATSDR, 2008). They are common, particularly in large populations. From a statistical standpoint, it is nearly inevitable that some clusters of chronic diseases will emerge within various communities, be it schools, church groups, social circles, or neighborhoods. Initially, these clusters are often perceived as products of specific, predictable processes rather than random occurrences in a particular location, akin to a coin toss.

Whether a (suspected) cluster corresponds to an actual increase of disease in the area, needs to be assessed by an epidemiologist or biostatistician (ATSDR, 2008).

The function has_ongoing_cluster() returns a logical vector with the same length as dates, so dates can have any length.

See Also

early_warning_biomarker()

Examples

cases <- data.frame(date = sample(seq(as.Date("2015-01-01"),
                                      as.Date("2022-12-31"),
                                      "1 day"),
                                  size = 300),
                    patient = sample(LETTERS, size = 300, replace = TRUE))

# -----------------------------------------------------------

check <- early_warning_cluster(cases, threshold_percentile = 0.99)

has_clusters(check)
check


check2 <- early_warning_cluster(cases,
                                minimum_cases = 1,
                                threshold_percentile = 0.75)

check2
check2 |> format()

check2 |> n_clusters()
check2 |> has_clusters()
check2 |> has_clusters(n = 15)

check2 |> has_ongoing_cluster("2022-06-01")
check2 |> has_ongoing_cluster(c("2022-06-01", "2022-06-20"))
check2 |> has_cluster_before("2022-06-01")
check2 |> has_cluster_after("2022-06-01")

check2 |> unclass()

certe-medical-epidemiology/certestats documentation built on Nov. 9, 2024, 8:15 p.m.