beast: Bayesian changepoint detection and time series decomposition...

View source: R/beast.R

beastR Documentation

Bayesian changepoint detection and time series decomposition into trend, seasonality, and abrupt changes

Description

beast is a high-level interface to the BEAST algorithm, a Bayesian model averaging approach for decomposing regular time series or other 1D sequential data into individual components, including abrupt changes, trends, and periodic/seasonal variations. BEAST is useful for changepoint detection (e.g., breakpoints, joinpoints, or structural breaks), nonlinear trend analysis, time series decomposition, and time series segmentation.

Figure: Nile.jpg

Usage

beast(
  y,
  start          = 1,
  deltat         = 1,
  season         = c("harmonic", "svd", "dummy", "none"),
  period         = NULL,
  scp.minmax     = c(0, 10), sorder.minmax   = c(0, 5),
  tcp.minmax     = c(0, 10), torder.minmax   = c(0, 1),
  sseg.min       = NULL,     sseg.leftmargin = NULL, sseg.rightmargin = NULL,
  tseg.min       = NULL,     tseg.leftmargin = NULL, tseg.rightmargin = NULL,
  s.complexfct   = 0.0,
  t.complexfct   = 0.0,
  method         = c("bayes", "bic", "aic", "aicc", "hic",
                     "bic0.25", "bic0.5", "bic1.5", "bic2"),
  detrend        = FALSE,
  deseasonalize  = FALSE,
  mcmc.seed      = 0,
  mcmc.burnin    = 200,
  mcmc.chains    = 3,
  mcmc.thin      = 5,
  mcmc.samples   = 8000,
  precValue      = 1.5,
  precPriorType  = c("componentwise", "uniform", "constant", "orderwise"),
  hasOutlier     = FALSE,
  ocp.minmax     = c(0, 10),
  print.param    = TRUE,
  print.progress = TRUE,
  print.warning  = TRUE,
  quiet          = FALSE,
  dump.ci        = FALSE,
  dump.mcmc      = FALSE,
  gui            = FALSE,
  ...
)

Arguments

y

y is a numeric vector representing an evenly spaced regular time series. Missing values such as NA and NaN are allowed.

  • If y is irregular or unordered in time (e.g., daily data spanning multiple years with leap years: 365 points in some years, 366 in others), use beast.irreg instead.

  • If y is a matrix or 3D array consisting of multiple regular or irregular time series (e.g., stacked images), use beast123 instead.

  • If y is an object of class "ts", "xts", or "zoo", its time attributes (i.e., start, end, frequency) are used to infer start, deltat, period, and season. User-specified values for these arguments are then ignored to honor the time attributes of y. For example, if y is a ts object with frequency = 1, season = "none" is always assumed; if frequency > 1 (i.e., some periodic component is present in y) but season = "none" is supplied, it is silently replaced by season="harmonic".

If y is provided as a list of multiple time series, the multivariate version of the BEAST algorithm is invoked to decompose multiple series and detect common changepoints jointly. This feature is experimental and under active development. See ohio for a working example.

start

start is numeric (default 1.0) or Date; the time of the first data point of y. It can be specified as:

  • a scalar (e.g., 2021.0644);

  • a numeric vector c(year, month, day) (e.g., c(2021, 1, 24));

  • an R Date object (e.g., as.Date("2021-01-24")).

deltat

deltat is numeric (default 1.0) or character string; the time interval between consecutive data points. Its unit must be consistent with start.

  • If start is numeric, the unit is arbitrary and is not interpreted by BEAST (e.g., 2021.3 can be be of any unit, a time, a distance, etc.).

  • If start is a c(year, month, day) vector or a Date, deltat is interpreted in units of years. For example, for a monthly series with start = c(2021, 1, 24), start is internally converted to a fractional year 2021 + (24 - 0.5)/365 = 2021.0644 and deltat = 1/12 can be used to specify a 1-month interval (1/12 yr).

  • Alternatively, deltat can be a character string encoding both the value and the unit, such as "7 days", "7d", "1/2 months", "1 mn", "1 year", or "1y".

season

Character (default "harmonic"). Specifies whether y has a periodic component and how it is modeled:

  • "none": y is treated as trend-only; no periodic components are modeled. Seasonal parameters (e.g., sorder.minmax, scp.minmax, sseg.min) are ignored.

  • "harmonic": y has a periodic/seasonal component. The term season is used broadly for any periodic variation in y. The period of the seasonal component is not a BEAST model parameter; it is treated as known and must be supplied via period. For "harmonic", the seasonal component is modeled as a harmonic curve (a combination of sines and cosines).

  • "dummy": As for "harmonic", except that the periodic/seasonal component is modeled as a non-parametric curve. The harmonic order argument sorder.minmax is then irrelevant and ignored.

  • "svd" (experimental): As for "harmonic", but with the seasonal component expressed as a linear combination of basis functions derived from a singular value decomposition (SVD). These basis functions can be more parsimonious than pure harmonic bases and may improve detection of subtle seasonal changepoints.

period

Numeric or character string. The period of the seasonal/periodic component in y. It is needed only when a periodic component is present (e.g., season = "harmonic", "svd", or "dummy") and is ignored for trend-only data (season = "none"). The period must be in the same time unit as deltat, and it should satisfy period = deltat * freq, where freq is the number of data points per period.

  • In earlier versions, a separate freq argument was used; it is now obsolete and replaced by period. For backward compatibility, freq is still accepted via ..., but period takes precedence if both are provided.

  • If period is missing, BEAST attempts to guess a plausible period via autocorrelation. This guess can be unreliable, so users are strongly encouraged to specify period explicitly whenever a periodic component is present.

  • If period <= 0, season = "none" is assumed and a trend-only model is fitted.

  • When start or deltat is specified using dates, period may also be provided as a string, such as "1 year", "12 months", "365d", or "366 days".

scp.minmax

A numeric vector of length 2 (integers >= 0); the minimum and maximum numbers of seasonal changepoints (scp) allowed in segmenting the seasonal component. Used only when y has a seasonal component (i.e., season = "harmonic", "svd", or "dummy") and ignored for trend-only data. scp.minmax[2] couldn't be smaller than scp.minmax[1].

  • If scp.minmax[1] == scp.minmax[2], BEAST assumes a fixed number of seasonal changepoints and does not infer the posterior distribution over the number of changepoints, though it still estimates the probabilities and locations of most likely changepoints over time.

  • If scp.minmax[1] == scp.minmax[2] == 0, both the min and max numbers of scp are set to 0. That is, no seasonal changepoints are allowed; a single global seasonal model is used, but the harmonic order may still be inferred if sorder.minmax[1] != sorder.minmax[2].

sorder.minmax

A numeric vector of length 2 (integers >= 1); the minimum and maximum harmonic orders considered for the seasonal component. Used only when season = "harmonic" or "svd" and ignored for trend-only data or when season = "dummy".

  • If sorder.minmax[1] == sorder.minmax[2], BEAST assumes a fixed harmonic order and does not infer the posterior distribution over harmonic orders.

tcp.minmax

A numeric vector of length 2 (integers >= 0); the minimum and maximum numbers of trend changepoints (tcp) allowed in segmenting the trend component.

  • If tcp.minmax[1] == tcp.minmax[2], BEAST assumes a fixed number of trend changepoints and does not infer the posterior distribution over the number of changepoints, though it still estimates changepoint probabilities and locations.

  • If tcp.minmax[1] == tcp.minmax[2] == 0, both the min and max numbers are set to 0. That is, no trend changepoints are allowed, and a single global polynomial trend is fitted. The polynomial order may still be inferred if torder.minmax[1] != torder.minmax[2].

torder.minmax

A numeric vector of length 2 (integers >= 0); the minimum and maximum polynomial orders considered for the trend component. If missing, tcp.minmax defaults to c(0,1). Order 0 corresponds to a constant (flat) trend; order 1 corresponds to a linear trend.

  • If torder.minmax[1] == torder.minmax[2], BEAST assumes a fixed polynomial order and does not infer the posterior distribution over polynomial orders.

sseg.min

An integer (> 0); the minimum segment length, in number of time steps, allowed between neighboring seasonal changepoints. When fitting a piecewise seasonal model, two seasonal changepoints cannot be closer than sseg.min time intervals. sseg.min is unitless (counts of time steps); the corresponding time window in the underlying units is sseg.min * deltat. If NULL, a default is chosen internally (typically based on the the period).

sseg.leftmargin

Integer (>= 0); the number of leftmost observations excluded from seasonal changepoint detection. No seasonal changepoints are allowed in the initial window of length sseg.leftmargin. This is specified in time steps; the corresponding time window is sseg.leftmargin * deltat. If NULL, it defaults to sseg.min.

sseg.rightmargin

Integer (>= 0); the number of rightmost observations excluded from seasonal changepoint detection. No seasonal changepoints are allowed in the final window of length sseg.rightmargin. This is specified in time steps; the corresponding time window is sseg.rightmargin * deltat. If NULL, it defaults to sseg.min.

tseg.min

Integer (> 0); the minimum segment length, in number of time steps, allowed between neighboring trend changepoints. When fitting a piecewise polynomial trend, two trend changepoints cannot be closer than tseg.min time intervals. tseg.min is unitless (counts of time steps); the corresponding time window is tseg.min * deltat. If NULL, a default is chosen internally (often in reference to the presence or absence of a cyclic component).

tseg.leftmargin

Integer (>= 0); the number of leftmost observations excluded from trend changepoint detection. No trend changepoints are allowed in the initial window of length tseg.leftmargin. This is specified in time steps; the corresponding time window is tseg.leftmargin * deltat. If NULL, it defaults to tseg.min.

tseg.rightmargin

Integer (>= 0); the number of rightmost observations excluded from trend changepoint detection. No trend changepoints are allowed in the final window of length tseg.rightmargin. This is specified in time steps; the corresponding time window is tseg.rightmargin * deltat. If NULL, it defaults to tseg.min.

s.complexfct

Numeric (default 0.0); a hyperprior parameter–newly added in Version 1.02–controlling the complexity of the seasonal curve (i.e., the number of seasonal changepoints). A prior of the form p(k) \propto \exp[\lambda (k+1)] is placed on the number of seasonal changepoints k, where \lambda is seasonComplexityFactor. Setting \lambda = 0 yields a non-informative prior p(k) \propto 1.0 where all model dimensions are equally likely a priori. Users may tune seasonComplexityFactor in the range [-20, 20] or an even wider range: negative values (e.g., \lambda = -15.9) favor fewer changepoints (simpler seasonal curves), whereas positive values (e.g., \lambda = 5.76) favor more changepoints (more complex curves).

t.complexfct

Numeric (default 0.0); analogous to s.complexfct, but for the trend component and the number of trend changepoints.

method

Character (default "bayes"). Specifies how model posterior probabilities are formulated or approximated:

  • "bayes": Full Bayesian formulation as described in Zhao et al. (2019).

  • "bic": Approximate posterior probabilities using the Bayesian Information Criterion (BIC), \mathrm{BIC} = n \log(\mathrm{SSE}) + k \log(n), where k is the number of parameters and n the number of observations.

  • "aic": Approximate posterior probabilities using Akaike's Information Criterion (AIC), \mathrm{AIC} = n \log(\mathrm{SSE}) + 2k.

  • "aicc": Approximate posterior probabilities using the corrected AIC (AICc), \mathrm{AICc} = \mathrm{AIC} + \frac{2k^2 + 2k}{n - k - 1}.

  • "hic": Approximate posterior probabilities using the Hannan-Quinn Information Criterion (HIC), \mathrm{HIC} = n \log(\mathrm{SSE}) + 2k \log(\log(n)).

  • "bic0.25": BIC-type approximation adopted from Kim et al. (2016) \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1016/j.jspi.2015.09.008")} with reduced complexity penalty, \mathrm{BIC}_{0.25} = n \log(\mathrm{SSE}) + 0.25 k \log(n).

  • "bic0.5": As above but with penalty factor 0.5.

  • "bic1.5": As above but with penalty factor 1.5.

  • "bic2": As above but with penalty factor 2.0.

detrend

Logical; if TRUE, a global trend is first fitted and removed from the series before running BEAST, and then added back to the BEAST result at the end.

deseasonalize

Logical; if TRUE, a global seasonal model is first fitted and removed before running BEAST, and then added back to the BEAST result. Ignored if season = "none" (trend-only data).

mcmc.seed

Integer (>= 0); seed for the random number generator used in the MCMC. If mcmc.seed = 0, an arbitrary seed is used and results may vary across runs. Using the same non-zero seed makes results reproducible on the same machine, though differences across platforms/CPUs may still arise due to differences in random number libraries or CPU instruction sets.

mcmc.chains

Integer (> 0); the number of parallel MCMC chains.

mcmc.thin

Integer (> 0); thinning factor. If mcmc.thin = k, every k-th iteration is retained and the others are discarded.

mcmc.burnin

Integer (> 0); number of initial iterations discarded as burn-in for each chain.

mcmc.samples

Integer (>= 0); number of samples collected per MCMC chain. The total number of iterations is (mcmc.burnin + mcmc.samples * mcmc.thin) * mcmc.chains.

precValue

Numeric (> 0); hyperparameter for the precision priors on model coefficients. The default is 1.5. It is used directly only when precPriorType = "constant" (see below); in other cases it serves as an initial value.

precPriorType

Character; one of "constant", "uniform", "componentwise" (default), or "orderwise". These control how precision parameters for the model coefficients are treated:

  1. "constant": A single precision parameter is fixed at precValue. The fit may be sensitive to the chosen value.

  2. "uniform": A single precision parameter is treated as a random variable with initial value precValue; its posterior is inferred via MCMC. The results are less sensitive to the initial choice of precValue.

  3. "componentwise": Separate precision parameters are used for different components (e.g., season vs. trend), each initialized by precValue and inferred via MCMC.

  4. "orderwise": Separate precision parameters are used for different components and for different orders within each component, all initialized by precValue and inferred via MCMC.

hasOutlier

Logical; if TRUE, fit a model with an outlier component representing potential spikes or dips at isolated data points:

  • Y = trend + outlier + error if season = "none",

  • Y = trend + season + outlier + error otherwise.

ocp.minmax

Numeric vector of length 2 (integers >= 0); the minimum and maximum numbers of outlier-type changepoints (ocp) allowed in the series. Outliers refer to spikes or dips at isolated times that cannot be captured by the trend or seasonal components.

print.param

Logical; if TRUE, the full list of internal BEAST parameters (metadata, prior, mcmc, extra) is printed before MCMC begins. These internal objects correspond one-to-one with the arguments of beast. For example, prior$trendMinSepDist corresponds to tseg.min. See beast123 or inspect the source of beast for details.

print.progress

Logical; if TRUE, a progress bar is printed.

print.warning

Logical; if TRUE, warning messages are printed.

quiet

Logical; if TRUE, suppress most console output (overrides print.param and print.progress).

dump.ci

Logical; if TRUE, credible intervals for the trend and seasonal components (e.g., out$trend$CI, out$season$CI) are computed and returned. Computing credible intervals is relatively time-consuming (due to sorting and summarizing many MCMC samples). If only symmetric uncertainty summaries are needed, the posterior standard deviations out$trend$SD and out$season$SD may suffice and dump.ci can be set to FALSE.

dump.mcmc

Logical; if TRUE, individual MCMC samples are dumped to the output.

gui

Logical; if TRUE, BEAST runs with a GUI window showing an animation of the MCMC sampling in model space (numbers and locations of trend/seasonal changepoints). This feature is experimental and currently available only on 64-bit Windows systems (not 32-bit Windows, macOS, or Linux).

...

Additional arguments reserved for future extensions and backward compatibility. Certain low-level settings are only available via beast123() (through metadata, prior, mcmc, and extra), not via beast().

Value

The result is an object of class "beast", i.e., a list with structure identical to the outputs of beast.irreg and beast123. The exact array dimensions depend on the input y. Below we assume a single time series input of length N.

time

Numeric vector of length N; the time associated with each observation. By default, this is simply 1:N if start, deltat, and any time attributes are missing.

data

A vector, matrix, or 3D array; a copy of the input y if the underlying extra$dumpInputData = TRUE. If extra$dumpInputData = FALSE, this is NULL. For irregular inputs (as in beast.irreg), this field stores the aggregated regular series at time intervals spaced by deltat (or metadata$deltaTime in the beast123 interface).

marg_lik

Numeric; the average marginal likelihood of the sampled models. Larger values indicate better fit for a given time series (e.g., -1 is better than -10; 10 is better than -1 or -10).

R2

Numeric; coefficient of determination (R-squared) of the fitted model.

RMSE

Numeric; root mean squared error of the fitted model.

sig2

Numeric; estimated variance of the residual error.

trend

trend is a list of outputs related to the estimated trend component:

  • ncp: Mean number of trend changepoints. Since individual sampled models can have different numbers of changepoints, several alternative summaries (ncp_mode, ncp_median, ncp_pct90) are also provided. For example, if mcmc$samples = 10 and the numbers of changepoints across models are c(0, 2, 4, 1, 1, 2, 7, 6, 6, 1), then the mean ncp is 3.1, the median 2.5, the mode 1, and the 90th percentile (ncp_pct90) 6.5.

  • ncp_mode: Mode of the number of trend changepoints.

  • ncp_median: Median of the number of trend changepoints.

  • ncp_pct90: 90th percentile of the number of trend changepoints.

  • ncpPr: Numeric vector of length tcp.minmax[2] + 1; probability distribution over the number of trend changepoints in 0:tcp.minmax[2]. For example, ncpPr[1] is the probability of having no trend changepoint; ncpPr[i] is the probability of having i - 1 changepoints.

  • cpOccPr: Numeric vector of length N; changepoint occurrence probability at each time index for the trend component. Plotting cpOccPr yields a continuous "probability-of-being-a-changepoint" curve. A higher peak at a single time step does not necessarily imply a higher overall probability of a changepoint in a neighborhood compared to a broader peak with lower maximum but higher summed probability. For example, a window of cpOccPr values c(0, 0, 0.5, 0, 0) (peak 0.5, sum 0.5) likely indicates a smaller chance of a changepoint than another window c(0.1, 0.2, 0.21, 0.2, 0.1) (peak 0.21, sum 0.71).

  • order: Numeric vector of length N; average polynomial order of the trend over sampled models at each time step. As a posterior average, it need not be an integer.

  • cp: Numeric vector of length up to tcp.minmax[2]; estimated trend changepoint locations. These are obtained by smoothing cpOccPr with a window of size tseg.min and selecting up to tcp.minmax[2] peaks from the smoothed curve. Some entries may be NaN if fewer changepoints are identified. Not all reported changepoints should be treated as "true"; some may be false positives. Check the associated posterior probabilities (cpPr) for confidence.

  • cpPr: Numeric vector of length tcp.minmax[2]; posterior probability associated with each changepoint in cp. Trailing entries are NaN if fewer changepoints are identified.

  • cpCI: Numeric matrix of dimension tcp.minmax[2] x 2; credible intervals for changepoint locations in cp.

  • cpAbruptChange: Numeric vector of length tcp.minmax[2]; magnitude of jumps in the trend at the detected changepoints.

  • Y: Numeric vector of length N; estimated trend component (Bayesian model average over sampled trends).

  • SD: Numeric vector of length N; posterior standard deviation of the trend.

  • CI: Numeric matrix of dimension N x 2; credible interval for the trend, with lower and upper envelopes.

  • slp: Numeric vector of length N; time-varying slope of the trend component.

  • slpSD: Numeric vector of length N; posterior standard deviation of the slope.

  • slpSgnPosPr: Numeric vector of length N; posterior probability that the slope is positive at each time point (i.e., increasing trend). For example, slpSgnPosPr = 0.80 at a given time indicates that 80% of sampled trends have a positive slope there.

  • slpSgnZeroPr: Numeric vector of length N; posterior probability that the slope is effectively zero at each time point. The probability of negative slope can be obtained as 1 - slpSgnZeroPr - slpSgnPosPr.

  • pos_ncp, neg_ncp, pos_ncpPr, neg_ncpPr, pos_cpOccPr, neg_cpOccPr, pos_cp, neg_cp, pos_cpPr, neg_cpPr, pos_cpAbruptChange, neg_cpAbruptChange, pos_cpCI, neg_cpCI: As above, but restricted to trend changepoints with positive jumps (pos_*) or negative jumps (neg_*) in the trend. For example, pos_ncp is the mean number of trend changepoints where the trend level jumps up.

  • inc_ncp, dec_ncp, inc_ncpPr, dec_ncpPr, inc_cpOccPr, dec_cpOccPr, inc_cp, dec_cp, inc_cpPr, dec_cpPr, inc_cpAbruptChange, dec_cpAbruptChange, inc_cpCI, dec_cpCI: As above, but restricted to changepoints where the trend slope increases (inc_*) or decreases (dec_*). For example, if the trend slope changes from 0.4 to 2.5, that changepoint contributes to inc_ncp.

season

season is a list analogous to trend, but for the seasonal/periodic component (when present):

  • ncp: Mean number of seasonal changepoints.

  • ncp_mode, ncp_median, ncp_pct90: As for trend, but for seasonal changepoints.

  • ncpPr: Numeric vector of length scp.minmax[2] + 1; probability distribution over the number of seasonal changepoints in 0:scp.minmax[2]. For example, ncpPr[1] is the probability of having no seasonal changepoint.

  • cpOccPr: Numeric vector of length N; seasonal changepoint occurrence probability over time. The same caveat applies as for trend$cpOccPr: the height of a single peak does not fully determine the overall probability of a changepoint in its neighborhood.

  • order: Numeric vector of length N; average harmonic order needed to approximate the seasonal component. As a posterior average over piecewise harmonic/SVD-based curves, this need not be an integer.

  • cp, cpPr, cpCI, cpAbruptChange: Analogous to the trend fields, but for the seasonal component, with length determined by scp.minmax[2].

  • Y: Numeric vector of length N; estimated seasonal component (Bayesian model average).

  • SD: Numeric vector of length N; posterior standard deviation of the seasonal component.

  • CI: Numeric matrix of dimension N x 2; credible interval for the seasonal component.

  • amp: Numeric vector of length N; time-varying amplitude of the seasonal component.

  • ampSD: Numeric vector of length N; posterior standard deviation of the amplitude.

  • pos_ncp, neg_ncp, pos_ncpPr, neg_ncpPr, pos_cpOccPr, neg_cpOccPr, pos_cp, neg_cp, pos_cpPr, neg_cpPr, pos_cpAbruptChange, neg_cpAbruptChange, pos_cpCI, neg_cpCI: Seasonal analogues of the trend outputs with the same names, restricted to seasonal changepoints with positive (pos_*) or negative (neg_*) jumps in the seasonal curve. For example, pos_ncp refers to the average number of seasonal changepoints at which the seasonal component jumps up.

outlier

outlier is a list analogous to trend or season, but for the outlier component ( present only if setting hasOutlier=TRUE)

Note

The three functions beast(), beast.irreg(), and beast123() implement the same BEAST algorithm but with different APIs. There is a one-to-one correspondence between (1) the arguments of beast() / beast.irreg() and (2) the metadata, prior, mcmc, and extra lists used by beast123(). Examples include:

start <-> metadata$startTime
deltat <-> metadata$deltaTime
deseasonalize <-> metadata$deseasonalize
hasOutlier <-> metadata$hasOutlier
scp.minmax[1:2] <-> prior$seasonMinKnotNum, prior$seasonMaxKnotNum
sorder.minmax[1:2] <-> prior$seasonMinOrder, prior$seasonMaxOrder
sseg.min <-> prior$seasonMinSepDist
tcp.minmax[1:2] <-> prior$trendMinKnotNum, prior$trendMaxKnotNum
torder.minmax[1:2] <-> prior$trendMinOrder, prior$trendMaxOrder
tseg.leftmargin <-> prior$trendLeftMargin
tseg.rightmargin <-> prior$trendRightMargin
s.complexfct <-> prior$seasonComplexityFactor
t.complexfct <-> prior$trendComplexityFactor
mcmc.seed <-> mcmc$seed
dump.ci <-> extra$computeCredible.

For large data sets, irregular time series, or stacked images, beast123() is generally the recommended interface.

References

  1. Zhao, K., Wulder, M.A., Hu, T., Bright, R., Wu, Q., Qin, H., Li, Y., Toman, E., Mallick, B., Zhang, X. and Brown, M., 2019. Detecting change-point, trend, and seasonality in satellite time series data to track abrupt changes and nonlinear dynamics: A Bayesian ensemble algorithm. Remote Sensing of Environment, 232, p.111181 (the beast algorithm paper).

  2. Zhao, K., Valle, D., Popescu, S., Zhang, X. and Mallick, B., 2013. Hyperspectral remote sensing of plant biochemistry using Bayesian model averaging with variable and band selection. Remote Sensing of Environment, 132, pp.102-119 (the Bayesian MCMC scheme used in beast).

  3. Hu, T., Toman, E.M., Chen, G., Shao, G., Zhou, Y., Li, Y., Zhao, K. and Feng, Y., 2021. Mapping fine-scale human disturbances in a working landscape with Landsat time series on Google Earth Engine. ISPRS Journal of Photogrammetry and Remote Sensing, 176, pp.250-261(a beast application paper).

See Also

beast.irreg, beast123, minesweeper, tetris, geeLandsat

Examples

library(Rbeast)



#------------------------------------Example 1----------------------------------------#
# 'googletrend_beach' is the monthly Google Trends popularity of searching for 'beach'
# in the US from 2004 to 2022. Sudden changes coincide with known extreme weather
# events (e.g., 2006 North American Blizzard, 2011 record US summer heat, record warm
# January in 2016) or the COVID-19 outbreak.

out <- beast(googletrend_beach)

plot(out)
plot(out, vars = c("t", "slpsgn"))  # plot trend and slope-sign probabilities only.
                                    # In the slpsgn panel, the upper red band is the
                                    # probability that the trend slope is positive,
                                    # the middle green band the probability that the
                                    # slope is effectively zero, and the lower blue
                                    # band the probability that it is negative.
                                    # See ?plot.beast for details.

#------------------------------------Example 2----------------------------------------#
# Yellowstone is a half-monthly satellite NDVI time series of length 774 starting
# from July 1-15, 1981 (start ~ c(1981, 7, 7)) at a forest site in Yellowstone.
# There are 24 data points per year. The 1988 Yellowstone Fire caused a sudden drop
# in greenness. Note that the beast function  hanldes only evenly-spaced regular time
# series. Irregular data need to be first  aggegrated at a regular time interval of 
# your choice--the aggregation functionality is implemented in beast.irreg() and beast123().

data(Yellowstone)
plot(1981.5 + (0:773) / 24, Yellowstone, type = "l") # A sudden drop in greenness in 1988 
                                                     # due to the 1988 Yellowstone Fire

# Yellowstone is a plain numeric vector (no time attributes). Below, no extra
# arguments are supplied, so default values (start = 1, deltat = 1) are used and
# time is 1:774. The period is missing and is guessed via autocorrelation.
# Autocorrelation-based period estimates can be unreliable; examples below show
# how to specify time/period explicitly.

o <- beast(Yellowstone)   # By default, time = 1:length(Yellowstone), season = "harmonic"
plot(o)

# o <- beast(Yellowstone, quiet = TRUE)                        # suppress warnings
# o <- beast(Yellowstone, quiet = TRUE, print.progress = FALSE) # suppress all output

#------------------------------------Example 3----------------------------------------#
# Time information (start, deltat, period) specified explicitly of Yellowstone.

# (1) Arbitrary unit: 1981.5137 can be interpreted flexibly in any units not neccessarily
# referting to times
o <- beast(Yellowstone, start = 1981.5137, deltat = 1/24, period = 1.0)

# Strings can be used to explicitly specify time unitsas dates or years:
# o <- beast(Yellowstone, start = 1981.5137, deltat = "1/24 year", period = 1.0)
# o <- beast(Yellowstone, start = 1981.5137, deltat = "0.5 mon",   period = 1.0)
# o <- beast(Yellowstone, start = 1981.5137, deltat = 1/24, period = "1 year")
# o <- beast(Yellowstone, start = 1981.5137, deltat = 1/24, period = "365 days")

# (2) start as year-month-day (unit is year: deltat = 1/24 year = 0.5 month)
# o <- beast(Yellowstone, start = c(1981, 7, 7), deltat = 1/24, period = 1.0)

# (3) start as Date (unit is year: deltat = 1/24 year = 0.5 month)
# o <- beast(Yellowstone, start = as.Date("1981-07-07"), deltat = 1/24, period = 1.0)

print(o)     # 'o' is a list with many fields
str(o)       # See a list of fields in the BEAST output o

plot(o)                            # plot many default variables
plot(o, vars = c("y","s","t"))     # plot the data, seasonal, and trend components only
plot(o, vars = c("s","scp","samp","t","tcp","tslp")) # selected variables (see ?plot.beast)
plot(o, vars = c("s","t"), col = c("red","blue"))    # Specify colors of selected subplots

plot(o$time, o$season$Y, type = "l")    # fitted seasonal component
plot(o$time, o$season$cpOccPr)          # seasonal changepoint probabilities
plot(o$time, o$trend$Y, type = "l")     # fitted trend component
plot(o$time, o$trend$cpOccPr)           # trend changepoint probabilities
plot(o$time, o$season$order)            # average harmonic order over time

plot(o, interactive = TRUE)            # interactively choose variables to plot




#------------------------------------Example 4----------------------------------------#
# Specify other arguments explicitly.  Default values are used for missing parameters.
# Note that beast(), beast.irreg(), and beast123() call the same internal C/C++ library,
# so in beast(), the input parameters will be converted to metadata, prior, mcmc, and 
# extra parameters as explained for the beast123() function. Or type 'View(beast)' to 
# check the parameter assignment in the code.
 
 
 # In R's terminology, the  number of datapoints per period is also called 'freq'. In this
 # version, the 'freq' argument is obsolete and replaced by 'period'.
 
 # period=deltat*number_of_datapoints_per_period = 1.0*24=24 because deltat is set to 1.0 by
 # default and this signal has 24 samples per period. 
 o = beast(Yellowstone, period=24.0, mcmc.samples=5000, tseg.min=20)
 
 # period=deltat*number_of_datapoints_per_period = 1/24*24=1.0.
 # o = beast(Yellowstone, deltat=1/24 period=1.0, mcmc.samples=5000, tseg.min=20)
  
 o = beast( 
     Yellowstone,            # Yellowstone: a pure numeric vector wo time info
     start   = 1981.51, 
     deltat  = 1/24,         
     period  = 1.0,           # Period=delta*number_of_datapoints_per_period
     season  = 'harmonic',    # Periodic compnt exisits,fitted as a harmonic curve 
     scp.minmax     = c(0,3), # Min and max numbers of seasonal changpts allowed
     sorder.minmax  = c(1,5), # Min and max harmonic orders allowed
     sseg.min       = 24,     # The min length of segments btw neighboring chnpts
	                          # '24' means 24 datapoints; the unit is datapoint.
     sseg.leftmargin= 40,     # No seasonal chgpts allowed in the starting 40 datapoints
     tcp.minmax     = c(0,10),# Min and max numbers of changpts allowed in the trend
     torder.minmax  = c(0,1), # Min and maxx polynomial orders to fit trend
     tseg.min       = 24,     # The min length of segments btw neighboring trend chnpts
     tseg.leftmargin= 10,     # No trend chgpts allowed in the starting 10 datapoints
     s.complexfct   = 0.26,   # Manually tune it to fit a more complex seasonal curve
	                          #  than the non-informative prior on number of changepts
     t.complexfct   = -5.2,   # Manually tune it to fit a more complex trend curve
	                          #  than the non-informative prior on number of changepts 
     deseasonalize  = TRUE,   # Remove the global seasonality before fitting the beast model
     detrend        = TRUE,   # Remove the global trend before fitting the beast model
     mcmc.seed      = 0,      # A seed for mcmc's random nummber generator; use a
                              # non-zero integer to reproduce results across runs
     mcmc.burnin    = 500,    # Number of initial iterations discarded
     mcmc.chains    = 2,      # Number of chains
     mcmc.thin      = 3,      # Include samples every 3 iterations
     mcmc.samples   = 6000,   # Number of samples taken per chain
                              # total iteration: (500+3*6000)*2	
     print.param     = FALSE  # Do not print the parameters							  
     )
 plot(o)
 plot(o,vars=c('t','slpsgn') )         # plot only trend and slope sign 
 plot(o,vars=c('t','slpsgn'), relative.heights =c(.8,.2) ) # run "?plot.beast" for more info
 plot(o, interactive=TRUE)
 

 
#------------------------------------Example 5----------------------------------------#
# Run an interactive GUI to visualize how BEAST is samplinig from the possible model 
# spaces in terms of the numbers and timings of seasonal and trend changepoints.
# The GUI inferface allows changing the option parameters interactively. This GUI is 
# only available on Win x64 machines, not Mac or Linux.

## Not run: 
 beast(Yellowstone, period=24, gui=TRUE) 

## End(Not run)

#------------------------------------Example 6----------------------------------------#
# Apply beast to trend-only data. 'Nile' is the ANNUAL river flow of the river
# Nile at Aswan since 1871. It is a 'ts' object; its time attributes (start=1871, 
# end=1970,frequency=1) are used to replace the user-supplied start,deltat, and freq, 
# if any. 


 data(Nile)  
 plot(Nile)     
 attributes(Nile) # a ts object with time attributes (i.e., tsp=(start,end,freq)
 
 o = beast(Nile)  # start=1871, delta=1, and freq=1 taken from Nile itself
 plot(o)
 
 o = beast(Nile,             # the same as above. The user-supplied values (i.e., 2023,
           start=2023,       # 9999) are ignored bcz Nile carries its own time attributes.
           period=9999,      # Its frequency tag is 1 (i.e., trend-only), so season='none'
           season='harmonic' # is used instead of the supplied 'harmonic'
		   )
 
 
#------------------------------------Example 7----------------------------------------#
# NileVec is  a pure data vector. The first run below is WRONG bcz NileVec was assumed
# to have a perodic component by default and beast gets a best estimate of freq=6 while 
# the true value is freq=1. To fit a trend-only model, season='none' has to be explicitly
# specified, as in the 2nd & 3rd funs.

 NileVec = as.vector(Nile) # NileVec is not a ts obj but a pure numeric data vector
 o       = beast(NileVec)  # WRONG WAY to call: No time attributes available to interpret 
                           # NileVec. By default, beast assumes season='harmonic', start=1,
                           # & deltat=1. 'freq' is missing and guessed to be 6 (WRONG).    
						   
 plot(o)                   # WRONG Results: The result has a suprious seasonal component 
							  
 o=beast(NileVec,season='none') # The correct way to call: Use season='none' for trend-only 
                                # analysis; the default time is the integer indices
                                # "1:length(NileVec)'. 
 print(o$time)							 
								
 o=beast(NileVec,               # Recommended way to call: The true time attributes are 
         start  = 1871,         # given explicitly through start and deltat (or freq if 
         deltat = 1,            # there is a  cyclic/seasonal cmponent). 
         season = 'none')  
 print(o$time)			 
 plot(o)



#------------------------------------Example 8----------------------------------------#
# beast can handle missing data. co2 is a monthly time series (i.e.,freq=12) starting
# from Jan 1959. We generate some missing values at random indices
 
## Not run: 

 data(co2)  
 attributes(co2)                          # A ts object with time attributes (i.e., tsp)
 badIdx      = sample( 1:length(co2), 50) # Get a set of random indices
 co2[badIdx] = NA                         # Insert some data gaps   

 out=beast(co2) # co2 is a ts object and its 'tsp' time attributes are used to get the
                # true time info. No need to specify 'start','deltat', & freq explicity.
				
 out=beast(co2,                  # The supplied time/period values will be ignored bcz
           start  = c(1959,1,15),# co2 is a ts object; the correct period = 1 will be 
           deltat = 1/12,        # used.
           period = 365)  
 print(out)
 plot(out)

## End(Not run) 



#------------------------------------Example 9----------------------------------------#
# Apply beast to time seris-like sequence data: the unit of sequences is not 
# necessarily time.
 

  data(CNAchrom11) # DNA copy number alterations in Chromesome 11 for cell line GM05296
                   # The data is orderd by genomic position (not time), and the values
                   # are the log2-based intensity ratio of copy numbers between the sample
                   # the reference. A value of zero means no gain or loss in copy number.
  o = beast(CNAchrom11,season='none') # season is a misnomer here bcz the data has nothing
                                      # to do with time. Regardless, we fit only a trend.
  plot(o)									  
 
 


#------------------------------------Example 10---------------------------------------#
# Apply beast to time seris-like data: the unit of sequences is not necessarily time.
 

  # Age of Death of Successive Kings of England
  # If the data link is deprecated, install the time series data library instead,
  # which is available at https://pkg.yangzhuoranyang.com/tsdl/
  # install.packages("devtools")
  # devtools::install_github("FinYang/tsdl")
  # kings = tsdl::tsdl[[293]]
  
  kings = scan("http://robjhyndman.com/tsdldata/misc/kings.dat",skip=3)
  out   = beast(kings,season='none')
  plot(out) 
  
 

 
#------------------------------------Example 11---------------------------------------#
# Another example from the tsdl data library
 


  # Number of monthly births in New York from Jan 1946 to Dec 1959
  # If the data link becomes invalid, install the time series data package instead
  # install.packages("devtools")
  # devtools::install_github("FinYang/tsdl")
  # kings = tsdl::tsdl[[534]]
  
  births = scan("http://robjhyndman.com/tsdldata/data/nybirths.dat") 
  out    = beast(births,start=c(1946,1,15), deltat=1/12 )  
  plot(out) # the result is wrong bcz the guessed freq via auto-correlation by beast 
            # is 2 rather than 12, so we recommend always specifying 'freq' explicitly
            # for those time series with a periodic component, as shown below.
  out    = beast(births,start=c(1946,1,15), deltat=1/12, freq  =12 )  
  out    = beast(births,start=c(1946,1,15), deltat=1/12, period=1.0 )  
  plot(out)  

  # Fit the seasonal component using a singular-value-decompostion-based basis functions
  # rathat than the default harmonic sin/cos basis functions.
  out    = beast(births,start=c(1946,1,15), deltat=1/12, period=1.0, season="svd" )  
  plot(out)    
  


#------------------------------------Example 12---------------------------------------#
#    Daily confirmed COVID-19 new cases and deaths across the globe
 
 ## Not run: 
 data(covid19)
 plot(covid19$date, covid19$newcases, type='l')
 
 newcases = sqrt( covid19$newcases )  # Apply a square root-transformation
 
 # This ts varies periodically every 7 days. 7 days can't be precisely represented 
 # in  the unit of year bcz some years has 365 days and others has 366. BEAST can hanlde 
 # this in two ways.


 #(1) Use the date number as the time unit--the num of days lapsed since 1970-01-01. 
  
  datenum  = as.numeric(covid19$date) 
  o        = beast(newcases, start=min(datenum), deltat=1, period=7) 
  o$time   = as.Date(o$time, origin='1970-01-01') # Convert from integers to Date.
  plot(o)
 
 #(2) Use strings to explicitly specify deltat and period with a unit. 
 
  startdate = covid19$date[1]
  o         = beast(newcases, start=startdate, deltat='1day', period='7days') 
  plot(o)
 
 
## End(Not run)
 
#------------------------------------Example 13---------------------------------------#
# The old API interface of beast is still made available but NOT recommended. It is 
# kept mainly to ensure the working of the sample code on Page 475 in the text
# Ecological Metods by Drs. Southwood and Henderson.

## Not run: 

  # The interface as shown here will be deprecated and NOT recommended.
  beast(Yellowstone, 24)  #24 is the freq: number of datapoints per period
  
   
  # Specify the model or MCMC parameters through opt as in Rbeast v0.2
  opt=list()             #Create an empty list to append individual model parameters
  opt$period=24          #Period of the cyclic component (i.e.,freq in the new version)
  opt$minSeasonOrder=2   #Min harmonic order allowed in fitting season component
  opt$maxSeasonOrder=8   #Max harmonic order allowed in fititing season component
  opt$minTrendOrder=0    #Min polynomial order allowed to fit trend (0 for constant)
  opt$maxTrendOrder=1    #Max polynomial order allowed to fit trend (1 for linear term)
  opt$minSepDist_Season=20#Min separation time btw neighboring season changepoints 
  opt$minSepDist_Trend=20 #Min separation time btw neighboring trend  changepoints
  opt$maxKnotNum_Season=4 #Max number of season changepoints allowed 
  opt$maxKnotNum_Trend=10 #Max number of trend changepoints allowed  
  opt$omittedValue=NA    #A customized value to indicate bad/missing values in the time 
                          #series, in additon to those NA or NaN values.
  					
  # The following parameters used to configure the reverisible-jump MCMC (RJMCC) sampler
  opt$chainNumber=2       #Number of parallel MCMC chains 
  opt$sample=1000         #Number of samples to be collected per chain
  opt$thinningFactor=3    #A factor to thin chains  
  opt$burnin=500          #Number of burn-in samples discarded at the start 
  opt$maxMoveStepSize=30  #For the move proposal, the max window allowed in jumping from 
                           #the current changepoint
  opt$resamplingSeasonOrderProb=0.2 #The probability of selecting a re-sampling proposal 
                                    #(e.g., resample seasonal harmonic order)
  opt$resamplingTrendOrderProb=0.2  #The probability of selecting a re-sampling proposal 
                                    #(e.g., resample trend polynomial order)
								   
  opt$seed=65654    #A seed for the random generator: If seed=0,random numbers differ
                    #for different BEAST runs. Setting seed to a chosen non-zero integer 
                    #will allow reproducing the same result for different BEAST runs.
 
  beast(Yellowstone, opt)  
 
## End(Not run)
 
#------------------------------------Example 14---------------------------------------#
# Fit a model with an outlier component: Y = trend + outlier + error. 
# Outliers here refer to spikes or dips at isolated points that can't be capatured by the 
# trend
## Not run:  
 NileVec        = as.vector(Nile)
 NileVec[50]    = NileVec[50] + 1500                 # Add an large artificial spike at t=50
 o = beast(NileVec, season='none', hasOutlier=TRUE)
 plot(o)
 
 o = beast(Nile, season='none', hasOutlier=TRUE)     # Fit a model with outliers
 plot(o) 
 
## End(Not run)


Rbeast documentation built on Dec. 19, 2025, 9:06 a.m.