cp_mean: Mean-Shift Changepoint

Description Usage Arguments Details Value Methods (by class) References Examples

Description

Test on device-events using the mean-shift changepoint method originally described in Xu, et al 2015.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
cp_mean(df, ...)

## S3 method for class 'mds_ts'
cp_mean(df, ts_event = c(Count = "nA"), analysis_of = NA, ...)

## Default S3 method:
cp_mean(
  df,
  analysis_of = NA,
  eval_period = NULL,
  alpha = 0.05,
  cp_max = 100,
  min_seglen = 6,
  epochs = NULL,
  bootstrap_iter = 1000,
  replace = T,
  zero_rate = 1/3,
  ...
)

Arguments

df

Required input data frame of class mds_ts or, for generic usage, any data frame with the following columns:

time

Unique times of class Date

event

Either the event count or rate of class numeric

...

Further arguments passed onto cp_mean methods

ts_event

Required if df is of class mds_ts. Named string indicating the variable corresponding to the event count or rate. Rate must be calculated in a separate column in df as it is not calculated by default. The name of the string is an English description of what was analyzed.

Default: c("Count"="nA") corresponding to the event count column in mds_ts objects. Name is generated from mds_ts metadata.

Example: c("Rate of Bone Filler Events in Canada"="rate")

analysis_of

Optional string indicating the English description of what was analyzed. If specified, this will override the name of the ts_event string parameter.

Default: NA indicates no English description for plain df data frames, or ts_event English description for df data frames of class mds_ts.

Example: "Rate of bone cement leakage"

eval_period

Optional positive integer indicating the number of unique times counting in reverse chronological order to assess. This will be used to establish the process mean and moving range.

Default: NULL considers all times in df.

alpha

Alpha or Type-I error rate for detection of a changepoint, in the range (0, 1).

Default: 0.05 detects a changepoint at an alpha level of 0.05 or 5%.

cp_max

Maximum number of changepoints detectable. This supersedes the theoretical max set by epochs.

Default: 100 detects up to a maximum of 100 changepoints.

min_seglen

Minimum required length of consecutive measurements without a changepoint in order to test for an additional changepoint within.

Default: 6 requires a minimum of 6 consecutive measurements.

epochs

Maximum number of epochs allowed in the iterative search for changepoints, where 2^epochs is the theoretical max changepoints findable. Within each epoch, all measurement segments with a minimum of min_seglen measurements are tested for a changepoint until no additional changepoints are found.

Default: NULL estimates max epochs from the number of observations or measurements in df and min_seglen.

bootstrap_iter

Number of bootstrap iterations for constructing the null distribution of means. Lowest recommended is 1000. Increasing iterations also increases p-value precision.

Default: 1000 uses 1000 bootstrap iterations.

replace

When sampling for the bootstrap, perform sampling with or without replacement. Unless your df contains many measurements, and definitely more than bootstrap_iter, it makes the most sense to set this to TRUE.

Default: T constructs bootstrap samples with replacement.

zero_rate

Required maximum proportion of events in df (constrained by eval_period) containing zeroes for this algorithm to run. Because mean-shift changepoint does not perform well on time series with many 0 values, a value >0 is recommended.

Default: 1/3 requires no more than 1/3 zeros in events in df in order to run.

Details

Function cp_mean() is an implementation of the mean-shift changepoint method originally proposed by Xu, et al (2015) based on testing the mean-centered absolute cumulative sum against a bootstrap null distribution. This algorithm defines a signal as any changepoint found within the last/most recent n=min_seglen measurements of df.

The parameters in this implementation can be interpreted as follows. Changepoints are detected at an alpha level based on n=bootstrap_iter bootstrap iterations (with or without replacement using replace) of the input time series df. A minimum of n=min_seglen consecutive measurements without a changepoint are required to test for an additional changepoint. Both epochs and cp_max constrain the maximum possible number of changepoints detectable as follows: within each epoch, each segment of consecutive measurements at least n=min_seglen measurements long are tested for a changepoint, until no additional changepoints are found.

Value

A named list of class mdsstat_test object, as follows:

test_name

Name of the test run

analysis_of

English description of what was analyzed

status

Named boolean of whether the test was run. The name contains the run status.

result

A standardized list of test run results: statistic for the test statistic, lcl and ucl for the 95 confidence bounds, p for the p-value, signal status, and signal_threshold.

params

The test parameters

data

The data on which the test was run

Methods (by class)

References

Xu, Zhiheng, et al. "Signal detection using change point analysis in postmarket surveillance." Pharmacoepidemiology and Drug Safety 24.6 (2015): 663-668.

Examples

1
2
3
4
5
6
7
8
9
# Basic Example
data <- data.frame(time=c(1:25), event=as.integer(stats::rnorm(25, 100, 25)))
a1 <- cp_mean(data)
# Example using an mds_ts object
a2 <- cp_mean(mds_ts[[3]])
# Example using a derived rate as the "event"
data <- mds_ts[[3]]
data$rate <- ifelse(is.na(data$nA), 0, data$nA) / data$exposure
a3 <- cp_mean(data, c(Rate="rate"))

mdsstat documentation built on March 13, 2020, 2:58 a.m.