beast123: Bayesian time series decomposition into changepoints, trend,...

View source: R/beast123.R

beast123R Documentation

Bayesian time series decomposition into changepoints, trend, and seasonal/periodic components

Description

BEAST is a Bayesian model averaging algorithm for decomposing time series or other 1D sequential data into individual components, including abrupt changes, trends, and periodic/seasonal variations. It is useful for changepoint detection (e.g., breakpoints or structural breaks), nonlinear trend analysis, time series decomposition, and time series segmentation. beast123 is a low-level interface to BEAST.

Usage

beast123(
  Y,
  metadata = list(),
  prior    = list(),
  mcmc     = list(),
  extra    = list(),
  season   = c("harmonic","svd","dummy","none"),
  method   = c("bayes","bic","aic","aicc","hic",
               "bic0.25","bic0.5","bic1.5","bic2"),
  ...
)

Arguments

Y

Y is a numeric 1D vector, 2D matrix, or 3D array. Missing values are allowed and can be indicated by NA, NaN, or a customized value specified via the 2nd argument metadata (e.g., metadata$missingValue=-9999).

  • If Y is a vector of size N x 1 or 1 x N, it is treated as a single time series of length N.

  • If Y is a 2D matrix or 3D array of dimension N1 x N2 or N1 x N2 x N3 (e.g., stacked images of geospatial data), it contains multiple time series of equal length. The dimension corresponding to time must be specified in metadata$whichDimIsTime. For example, metadata$whichDimIsTime = 1 for a 190 x 35 matrix indicates 35 time series of length 190; metadata$whichDimIsTime = 2 for a 100 x 200 x 300 array indicates 100 * 300 = 30000 time series of length 200 each.

Y can be either regular (evenly spaced in time) or irregular/unordered in time.

  • Regular data: Individual times are determined from the time of the first data point (startTime) and the time interval between consecutive points (deltaTime), specified via metadata$startTime and metadata$deltaTime. If not provided, both default to 1.0.

  • Irregular or unordered data: The times must be explicitly supplied via metadata$time. The BEAST model is currently formulated for regular data only, so beast123 internally aggregates/re-bins irregular data into a regular series. For this aggregation, metadata$deltaTime should also be provided to specify the desired bin size (time interval) of the regularized time series.

Y may contain both a periodic component and a trend, or only a trend. Use the argument season to specify the case:

  • season = "none": Y is treated as trend-only; no periodic components are assumed.

  • season = "harmonic": Y has a periodic/seasonal component. The term "season" is used broadly here to refer to any periodic variation. The period is not a model parameter estimated by BEAST; instead it must be supplied by the user via metadata$period, the unit of which should be consistent with that of metadata$deltaTime (see metadata$period below). If season = "harmonic", The seasonal component is modeled as a harmonic curve (a combination of sines and cosines).

  • season = "dummy": As for "harmonic", except that the periodic/seasonal component is modeled as a non-parametric curve.

  • season = "svd" (experimental): As for "harmonic", except that the periodic/seasonal component is modeled as a linear combination of basis functions derived from a singular value decomposition (SVD). The SVD-based basis functions can be more parsimonious than harmonic bases in parameterizing seasonal variations and may facilitate the detection of more subtle seasonal changepoints.

metadata

metadata is optional. If present, it can be:

  • a scalar specifying the period of Y;

  • a vector of numbers, character strings, or Dates specifying the times of Y; or

  • more commonly, a list object specifying various parameters describing the first argument Y.

If missing, default values are used. However, when Y is a 2D matrix or 3D array, metadata should be supplied explicitly to avoid misinterpreting which dimension corresponds to time. metadata is not part of the Bayesian BEAST model itself; it provides additional information needed to correctly interpret Y.

When metadata is a list, the following fields are supported (not all are needed in every case; the relevant subset depends on the input type and regularity of the series):

  • metadata$whichDimIsTime: Integer (<= 3). Specifies which dimension of Y corresponds to time for 2D or 3D inputs. Ignored if Y is a vector.

  • metadata$isRegularOrdered: Logical. Obsolete and no longer used in this version. Regularity is now inferred from metadata$time; if metadata$time is missing, Y is assumed to be regular.

  • metadata$time: A vector of the same length as the time dimension of Y, or a more complex object providing time information. It can be a vector of numbers, Dates, or date strings; it can also be a list of vectors of year, months, and days. Possible formats include:

    1. A vector of numeric values (e.g., c(1984.23, 1984.27, 1984.36, ...)). The time unit is arbitrary but must be consistent with the units used in metadata$startTime, metadata$deltaTime, and metadata$period.

    2. A vector of Dates (e.g., as.Date(c("1984-03-27","1984-04-10","1984-05-12", ...))).

    3. A vector of character strings. Examples are:

      • c("1984-03-27","1984-04-10","1984-05-12")

      • c("1984/03/27","1984,04,10","1984 05 12") (delimiters can differ as long as the Y-M-D order is consistent)

      • c("LT4-1984-03-27","LT4-1984-04-10","LT4-1984+05,12")

      • c("LT4-1984087ndvi","LT4-1984101ndvi","LT4-1984133ndvi")

      • c("1984,,abc 3/ 27","1984,,ddxfdd 4/ 10","ggd1984,, 5/ ttt 12")

      BEAST uses several heuristics to parse date strings without an explicit format specifier, but may fail when strings are ambiguous (e.g., "LC8-2020-09-20-1984", where both 2020 and 1984 look like possible years). To ensure correct parsing, use a list with an explicit date format specifier (see next item).

    4. A list object time = list(datestr = ..., strfmt = "...") consisting of:

      • time$datestr: a character vector of date strings, and

      • time$strfmt: a format string that specifies how to parse datestr.

      Three types of formats are supported:

      • (a) Fixed pattern of year, month, and day positions. For example, to extract 2001/12/02, 2002/11/08, etc. from time$datestr = c("P23R34-2001.1202333xd","O93X94-2002.1108133fd","TP3R34-2009.0122333td"), use time$strfmt = "P23R34-yyyy.mmdd333xd", where yyyy, mm, and dd denote the year, month, and day. All other positions are treated as wildcards and nd can be filled with any other letters different from yyyy, mm and dd..

      • (b) Fixed pattern of year and day-of-year (DOY) positions. For example, to extract year and DOY from "P23R342001888045", use time$strfmt = "123123yyyy888doy", where yyyy and doy are specifiers and other characters are wildcards. doy must be three digits.

      • (c) Fixed pattern of separators between year, month, and day. For example, to get 2002/12/02 from "2002,12/02", " 2002 , 12/2", or "2002,12 /02", use time$strfmt = "Y,M/D", where whitespace is ignored. To get 2002/12/02 from "2--12, 2012", use time$strfmt = "D--M,Y".

    5. A list object of separate vectors for dates. Use time$year, time$month, and time$day to specify dates, or alternatively time$year and time$doy (day of year, 1-365/366). Each vector must have the same length as the time dimension of Y.

  • metadata$startTime: Numeric (defaults to 1.0 if missing). The time of the first data point. It can be given as a scalar (e.g., 2021.23) or as a vector of length 3 in the form of c(year, month, day) (e.g., c(2021, 1, 24)). metadata$startTime is needed for regular data; for irregular data it is optional and, if missing, may be derived from metadata$time.

  • metadata$deltaTime: Numeric or character string. The time interval between consecutive data points. For regular data it is optional (default 1.0); for irregular data it should be specified, as it determines the bin size when aggregating irregular times to regular intervals. If metadata$time is numeric, the unit of deltaTime is arbitrary but must be consistent with the numeric scale of metadata$time. If metadata$time is given as Dates or date strings, deltaTime is interpreted as a fraction of a year by default. Alternatively, use a string instead of a number to specify the interval with a unit, such as deltaTime="7 days", "7d", "1/2 months", "1mn", "1 year", or "1y".

  • metadata$period: Numeric or character string. The period of the periodic/seasonal component in Y. This is needed only when a seasonal component is present (i.e., season = "harmonic" or season = "dummy") and is ignored for trend-only data (season = "none"). The period must be in the same time unit as deltaTime and should satisfy period = deltaTime * freq, where freq is the number of data points per period. (The older argument freq is now obsolete and replaced by period.) The period is not a BEAST model parameter and must be provided by the user. But if period is missing, BEAST first attempts to infer a plausible period via autocorrelation before fitting. If period <= 0, no seasonal component is assumed (equivalent to season = "none"). As with deltaTime, character strings such as period="1 year", "12 months", "365d", or "366 days" can be used.

  • metadata$missingValue: Numeric; a customized value indicating bad or missing values in addition to NA or NaN.

  • metadata$maxMissingRateAllowed: Numeric in [0, 1]; the maximum proportion of missing values allowed. Time series with a missing rate higher than this threshold are skipped and not fitted by BEAST.

  • metadata$hasOutlier: Logical; if TRUE, fit a model with an outlier component representing potential spikes or dips at isolated data points: Y = trend + outlier + error if season = "none", and Y = trend + season + outlier + error otherwise.

  • metadata$deseasonalize: Logical; if TRUE, remove a global seasonal component before fitting BEAST (see examples).

  • metadata$detrend: Logical; if TRUE, remove a global trend before fitting BEAST.

prior

prior (optional) is a list of hyperparameters defining the priors in the Bayesian BEAST model. Because these are part of the model specification, the BEAST results may be sensitive to their values. If prior is missing, default values are used and printed to the console at the start of the run. The following fields are supported:

  • prior$seasonMinOrder: Integer (>= 1).

  • prior$seasonMaxOrder: Integer (>= 1). The minimum and maximum harmonic orders considered for the seasonal component. These are used only when the series has a seasonal component (i.e., season = "harmonic" or "svd") and ignored for trend-only data or when season='none' or 'dummy'. If seasonMinOrder == seasonMaxOrder, the harmonic order is fixed and BEAST does not infer the posterior distribution of harmonic orders.

  • prior$seasonMinKnotNum: Integer (>= 0).

  • prior$seasonMaxKnotNum: Integer (>= 0). The minimum and maximum numbers of seasonal changepoints allowed in segementing the seasonal component. Used only when the data have a a seasonal component (e.g.,season = "harmonic", "svd" or "dummy" and ignored for trend-only data. If seasonMinKnotNum == seasonMaxKnotNum, the number of seasonal changepoints is fixed but their possible locations and occurrence probabilities over time are still estimated. If seasonMinKnotNum == seasonMaxKnotNum == 0, no seasonal changepoints are allowed and a single global harmonic model is used with no changepoints.

  • prior$seasonMinSepDist: Integer (> 0). The minimum separation, in number of time steps, between neighboring seasonal changepoints. That is, when fitting a piecewise seasonal model, no two changepoints may be closer than seasonMinSepDist time intervals. The corresponding time window in original time units is seasonMinSepDist * metadata$deltaTime.

  • prior$seasonLeftMargin: Integer (>= 0). The number of leftmost observations excluded from seasonal changepoint detection. No seasonal changepoints are allowed in the initial window of length seasonLeftMargin. It is specified in units of time steps and thus corresponds to a time window of seasonLeftMargin * metadata$deltaTime. If missing, it defaults to seasonMinSepDist.

  • prior$seasonRightMargin: Integer (>= 0). The number of rightmost observations excluded from seasonal changepoint detection. Similarly, no seasonal changepoints are allowed in the final window of length seasonRightMargin. Specified in time steps and corresponds to seasonRightMargin * metadata$deltaTime in the time unit. If missing, it defaults to seasonMinSepDist.

  • prior$seasonComplexityFactor: Numeric (default 0.0). A hyperprior parameter–newly added in Version 1.02– controlling the complexity of the seasonal curve (i.e., the number of seasonal changepoints). A prior of the form p(k) \propto \exp[\lambda (k+1)] is placed on the number of seasonal changepoints k, where \lambda is seasonComplexityFactor. Setting \lambda = 0 yields a non-informative prior p(k) \propto 1.0 where all model dimensions are equally likely a priori. Users may tune seasonComplexityFactor in the range [-20, 20] or an even wider range: negative values (e.g., \lambda = -15.9) favor fewer changepoints (simpler seasonal curves), whereas positive values (e.g., \lambda = 5.76) favor more changepoints (more complex curves).

  • prior$trendMinOrder: Integer (>= 0).

  • prior$trendMaxOrder: Integer (>= 0). The minimum and maximum polynomial orders considered to fit the trend component. Order 0 corresponds to a constant (flat) trend; order 1 to a linear trend. If trendMinOrder == trendMaxOrder, the polynomial order is fixed and BEAST does not infer the posterior distrubtion of polynomial orders.

  • prior$trendMinKnotNum: Integer (>= 0).

  • prior$trendMaxKnotNum: Integer (>= 0). The minimum and maximum numbers of trend changepoints allowed. If trendMinKnotNum == trendMaxKnotNum, the number of trend changepoints is fixed but ther locations and occurrence probabilities over time are still estimated. If trendMinKnotNum == trendMaxKnotNum == 0, no trend changepoints are allowed and a single global polynomial trend is fitted.

  • prior$trendMinSepDist: Integer (> 0). The minimum separation, in number of time steps, between neighboring trend changepoints. That is, when fitting a piecewise tremd model, no two changepoints may be closer than trendMinSepDist time intervals. The corresponding time window in original time units is trendMinSepDist * metadata$deltaTime.

  • prior$trendLeftMargin: Integer (>= 0). The number of leftmost observations excluded from trend changepoint detection. No trend changepoints are allowed in the initial window of length trendLeftMargin. Specified in time steps; the corresponding time window is trendLeftMargin * metadata$deltaTime. If missing, defaults to trendMinSepDist.

  • prior$trendRightMargin: Integer (>= 0). The number of rightmost observations excluded from trend changepoint detection. No trend changepoints are allowed in the final window of length trendRightMargin. Specified in time steps; the corresponding time window is trendRightMargin * metadata$deltaTime. If missing, defaults to trendMinSepDist.

  • prior$trendComplexityFactor: Numeric (default 0.0). Analogous to seasonComplexityFactor, but for the trend component. A prior of the form p(k) \propto \exp[\lambda (k+1)] is placed on the number of trend changepoints k, where \lambda is trendComplexityFactor. Setting \lambda = 0 yields a non-informative prior p(k) \propto 1.0 where all model dimensions are equally likely a priori. Users may tune seasonComplexityFactor in the range [-20, 20] or an even wider range: negative values (e.g., \lambda = -15.9) favor fewer changepoints (simpler trend curves), whereas positive values (e.g., \lambda = 5.76) favor more changepoints (more complex curves).

  • prior$precValue: Numeric (> 0); default 10. Useful only if prior$precPriorType = "constant".

  • prior$precPriorType: Character string, one of "constant", "uniform" (default), "componentwise", or "orderwise". These options control how precision parameters (for model coefficients) are treated:

    1. "constant": A single precision parameter is fixed at prior$precValue. The results may be sensitive to the choice of prior$precValue.

    2. "uniform": A single precision parameter is treated as a random variable, with initial value prior$precValue. Its posterior is inferred via MCMC, making the results less sensitive to the initial choice of prior$precValue.

    3. "componentwise": Separate precision parameters are used for different components (e.g., season vs. trend), starting from prior$precValue. Each precision parameter is inferred via MCMC.

    4. "orderwise": Separate precision parameters are used for different components and different orders within components, starting from prior$precValue. All the precision parameters are inferred via MCMC.

  • prior$outlierMinKnotNum: Integer (>= 0).

  • prior$outlierMaxKnotNum: Integer (>= 0). These are used only if metadata$hasOutlier = TRUE to specify the minimum and maximum numbers of outlier-type changepoints (spikes or dips) allowed in the series.

mcmc

mcmc (optional) is a list object of parameters configuring the reversible-jump MCMC procedure. These settings are not part of the Bayesian model itself but control the sampling process and chain length. In general, longer chains yield more stable estimates. Supported fields are:

  • mcmc$seed: Integer (>= 0); the seed for the random number generator. If mcmc$seed = 0, an arbitrary seed is used and results can vary slightly across runs. Using the same non-zero seed makes results reproducible on the same machine, but differences across platforms/CPUs may still occur from the same seed due to differences in random number libraries or CPU instruction sets.

  • mcmc$chainNumber: Integer (> 0); the number of parallel MCMC chains.

  • mcmc$samples: Integer (> 0); the number of samples collected per MCMC chain.

  • mcmc$thinningFactor: Integer (> 0); thinning factor. If thinningFactor = k, every k-th sample is retained and the others discarded.

  • mcmc$burnin: Integer (> 0); the number of initial samples discarded as burn-in per chain.

  • mcmc$maxMoveStepSize: Integer (> 0). In the RJMCMC sampler, a "move" proposal shifts changepoint locations. maxMoveStepSize specifies the maximum step size (in time steps) allowed in such moves.

  • mcmc$seasonResamplingOrderProb: Numeric in (0, 1); probability of proposing a resampling of the seasonal harmonic order.

  • mcmc$trendResamplingOrderProb: Numeric in (0, 1); probability of proposing a resampling of the trend polynomial order.

  • mcmc$credIntervalAlphaLevel: Numeric in (0, 1) (default 0.95); the confidence level used to compute credible intervals (e.g., 0.95 for 95% credible intervals).

extra

extra (optional) is a list of flags and options controlling output content, console messages, and parallel computation. Supported fields include:

  • extra$dumpInputData: Logical (default FALSE). If TRUE, copies of the input time series are included in the output. If the input Y is irregular the dumped copy is the aggregated regular series.

  • extra$whichOutputDimIsTime: Integer (<= 3). For 2D or 3D inputs (multiple time series, e.g., stacked images), this specifies which output dimension corresponds to time. Defaults to 3 for 3D inputs. Ignored if Y is a single time series.

  • extra$ncpStatMethod: Character string (deprecated). Formerly used to specify the statistic used to determine the number of changepoints (ncp) when computing the most likely changepoint locations (e.g., out$trend$cp and out$season$cp). Possible values were "mode", "mean", and "median" (default "mode"). Individual MCMC-sampled models have a varying number of changepoints. For example, if mcmc$samples=10 and assuming that the numbers of changepoints for the 10 sampled models are [0, 2, 4, 1, 1, 2, 7, 6, 6, 1], the mean ncp is 3.1 (rounded to 3), the median is 2.5 (2), and the mode is 1. ncpStatMethod is now deprecated: BEAST outputs all possible changepoints together with several summaries of ncp, including ncp, ncp_median, ncp_mode, and ncp_pct90. The choice of ncp for plotting is now controlled by the ncpStat argument in plot.beast.

  • extra$computeCredible: Logical (default TRUE). If TRUE, credible intervals are computed and returned.

  • extra$fastCIComputation: Logical (default TRUE). If TRUE, a faster approximate method is used to compute credible intervals. CI computation can be one of the most time-consuming steps, so fastCIComputation = TRUE is recommended unless more precise CI estimates are desired.

  • extra$computeSeasonOrder: Logical (default TRUE). If TRUE, a posterior estimate of the seasonal harmonic order is returned. Only meaningful when season = "harmonic" or "svd" and prior$seasonMinOrder != prior$seasonMaxOrder.

  • extra$computeTrendOrder: Logical (default TRUE). If TRUE, a posterior estimate of the trend polynomial order is returned. Only meaningful when prior$trendMinOrder != prior$trendMaxOrder.

  • extra$computeSeasonChngpt: Logical (default TRUE). If TRUE, compute the most likely times/positions at which changepoints occur in the seasonal component. This is relevant only when a seasonal component is present (i.e., season = "harmonic", "svd", or "dummy") and prior$seasonMaxKnotNum > 0.

  • extra$computeTrendChngpt: Logical (default TRUE). If TRUE, compute the most likely changepoint locations in the trend component. This is relevant only when prior$trendMaxKnotNum > 0.

  • extra$computeSeasonAmp: Logical (default FALSE). If TRUE, output the time-varying amplitude of the seasonal component.

  • extra$computeTrendSlope: Logical (default FALSE). If TRUE, output the time-varying slope of the trend component.

  • extra$tallyPosNegSeasonJump: Logical (default FALSE). If TRUE, classify seasonal changepoints by the sign of the jump (positive vs. negative) and output separate summaries for each group.

  • extra$tallyPosNegTrendJump: Logical (default FALSE). If TRUE, classify trend changepoints by whether the trend level jumps up or down at the changepoint and output separate summaries.

  • extra$tallyIncDecTrendJump: Logical (default FALSE). If TRUE, classify trend changepoints by changes in slope (increasing vs. decreasing) and output separate summaries for those changepoints with increased slopes and those with decreased slopes.

  • extra$printProgress: Logical (default FALSE). If TRUE, a progress bar is displayed showing the progress of the run. For multiple time series (e.g., stacked images), the progress bar also reports an estimate of remaining time.

  • extra$consoleWidth: Integer (default 0). The width (in characters) of each status line printed when extra$printProgress = TRUE. If 0, the width of the current console is used.

  • extra$printParameter: Logical (default TRUE). If TRUE, the values used in metadata, prior, mcmc, and extra are printed at the start of the run.

  • extra$printWarning: Logical (default TRUE). If TRUE, warning messages are printed.

  • extra$quiet: Logical. If TRUE, suppress most console output.

  • extra$dumpMCMCSamples: Logical. If TRUE, dump individual MCMC samples.

  • extra$numThreadsPerCPU: Integer (default 2); the number of threads scheduled per CPU core.

  • extra$numParThreads: Integer (default 0). Total number of parallel threads used when fitting many time series. If 0, the actual number of threads is set to extra$numThreadsPerCPU * (number of CPU cores). That is, each CPU core will generate a number 'numThreadsPerCPU' of thread. On 64-bit Windows, BEAST is group-aware and distributes threads across NUMA nodes. Currently, up to 256 CPU cores are supported.

season

Character (default "harmonic"). Specifies whether Y has a periodic component and how it is modeled:

  • "none": Y is trend-only; no periodic components are modeled. Seasonal prior parameters (e.g., seasonMaxKnotNum and seasonMinSepDist) are ignored.

  • "harmonic": Y has a periodic/seasonal component modeled as a harmonic series (e.g., combination of sines and cosines). The term season is used broadly to refer to any periodic variation. The period is not estimated but must be specified by the user (via metadata$period); see metadata$period.

  • "dummy": As "harmonic", except that the seasonal component is modeled as a non-parametric curve. The harmonic-order prior settings (e.g., prior$seasonMinOrder and prior$seasonMaxOrder) are irrelevant and ignored if season="dummy".

  • "svd": (Experimental) As "harmonic", except that the seasonal component is expressed as a linear combination of basis functions derived from a Single-value decomposition. These basis functions can be more parsimonious than pure harmonic bases, and may improve detection of subtle seasonal changepoints.

method

Character (default "bayes"). Specifies how the model posterior probability is formulated or approximated:

  • "bayes": Full Bayesian formulation as described in Zhao et al. (2019).

  • "bic": Approximate posterior probabilities using the Bayesian Information Criterion (BIC), \mathrm{BIC} = n \log(\mathrm{SSE}) + k \log(n), where k is the number of parameters and n the number of observations.

  • "aic": Approximate posterior probabilities using Akaike's Information Criterion (AIC), \mathrm{AIC} = n \log(\mathrm{SSE}) + 2k.

  • "aicc": Approximate posterior probabilities using the corrected AIC (AICc), \mathrm{AICc} = \mathrm{AIC} + \frac{2k^2 + 2k}{n - k - 1}.

  • "hic": Approximate posterior probabilities using the Hannan-Quinn Information Criterion (HIC), \mathrm{HIC} = n \log(\mathrm{SSE}) + 2k \log(\log(n)).

  • "bic0.25": BIC-type approximation adopted from Kim et al. (2016) \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1016/j.jspi.2015.09.008")} with reduced complexity penalty, \mathrm{BIC}_{0.25} = n \log(\mathrm{SSE}) + 0.25 k \log(n).

  • "bic0.5": As above but with penalty factor 0.5.

  • "bic1.5": As above but with penalty factor 1.5.

  • "bic2": As above but with penalty factor 2.0.

...

Additional parameters reserved for future extensions; currently unused.

Value

The output is a list object of class "beast" containing the following fields. Exact dimensions depend on the input Y and extra$whichOutputDimIsTime. The descriptions below assume a single time series input of length N; for 2D/3D inputs, interpret the results according to the 2D/3D dimensions.

time

A numeric vector of length N: the time associated with each sampled observation. By default, this is simply 1:N if metadata$time, metadata$startTime, or metadata$deltaTime are missing.

data

A vector, matrix, or 3D array; a copy of the input Y if extra$dumpInputData = TRUE. If extra$dumpInputData = FALSE, this is NULL. For irregular inputs in Y , this field stores the aggregated regular series at time intervals spaced by metadata$deltaTime.

marg_lik

Numeric; the average marginal likelihood of the sampled models. Larger values indicate better fit for a given time series.

R2

Numeric; coefficient of determination (R-squared) of the fitted model.

RMSE

Numeric; root mean squared error of the fitted model.

sig2

Numeric; estimated variance of the residual error.

trend

trend is a list of outputs related to the estimated trend component:

  • ncp: Numeric scalar; mean number of trend changepoints. Because individual models sampled by BEAST can have varying numbers of changepoints, several alternative summaries (ncp_mode, ncp_median, ncp_pct90) are also provided. For example, if mcmc$samples = 10 and the numbers of changepoints across models are c(0, 2, 4, 1, 1, 2, 7, 6, 6, 1), then the mean number of changepoints (ncp) is 3.1, the median 2.5, the mode 1, and the 90th percentile (ncp_pct90) 6.5.

  • ncp_mode: Mode of the number of trend changepoints. See the above for explanations.

  • ncp_median: Median of the number of trend changepoints. See the above for explanations.

  • ncp_pct90: 90th percentile of the number of trend changepoints. See the above for explanations.

  • ncpPr: Numeric vector of length prior$trendMaxKnotNum + 1. Probability distribution over the number of trend changepoints in the range 0:prior$trendMaxKnotNum. For example, ncpPr[1] is the probability of having no trend changepoint; ncpPr[i] is the probability of having i - 1 changepoints.

  • cpOccPr: Numeric vector of length N. The changepoint occurrence probability at each time index for the trend component. Plotting cpOccPr yields a continuous probability-of-being-a-changepoint curve. A peak with a high value at a single time step does not necessarily imply a higher overall probability of a changepoint in a neighborhood, compared to a broader peak with a lower maximum but higher total probability. Specifically, a higher peak indicates a higher chance of being a changepoint only at that particular SINGLE point in time and but not necessarily mean a higher chance of observing a changepoint AROUND that time. For example, a window of cpOccPr values c(0,0,0.5,0,0) (i.e., the peak prob is 0.5 and the summed probability over the window is 0.5) is less likely to be a changepoint compared to another window c(0.1,0.2,0.21,0.2,0.1) where the peak of 0.21 is lower but the summed probablity of 0.71 is larger.

  • order: Numeric vector of length N. The average polynomial order of the trend across sampled models at each time step. Because this is a posterior average, it need not be an integer.

  • cp: Numeric vector (up to length prior$trendMaxKnotNum). Estimated trend changepoint locations. These are obtained by smoothing cpOccPr with a window of size prior$trendMinSepDist and then selecting up to prior$trendMaxKnotNum peaks in the smoothed curve. Some entries may be NaN if the number of detected changepoints is fewer than trendMaxKnotNum. Not all listed changepoints should be treated as "true" changepoints; some may be false positives. Check the associated posterior probabiliites (given in the next field trend$cpPr for the confidence levels.

  • cpPr: Numeric vector of length prior$trendMaxKnotNum; the posterior probabilities associated with each changepoint in cp. Trailing entries are NaN if fewer than prior$trendMaxKnotNum changepoints are identified.

  • cpCI: Numeric matrix of dimension prior$trendMaxKnotNum x 2; credible intervals for the changepoint locations in cp.

  • cpAbruptChange: Numeric vector of length prior$trendMaxKnotNum; the estimated magnitude of the jump in the trend at each detected changepoint in cp.

  • Y: Numeric vector of length N; the estimated trend component (Bayesian model average over all sampled trends).

  • SD: Numeric vector of length N; posterior standard deviation of the estimated trend.

  • CI: Numeric matrix of dimension N x 2; credible interval for the trend, with lower and upper envelopes.

  • slp: Numeric vector of length N; time-varying slope of the trend component.

  • slpSD: Numeric vector of length N; posterior standard deviation of the slope.

  • slpSgnPosPr: Numeric vector of length N; posterior probability that the slope of the trend is positive at each time point. For example, slpSgnPosPr = 0.80 at a given time means that 80% of sampled trend models have a positive slope there.

  • slpSgnZeroPr: Numeric vector of length N; posterior probability that the slope of the trend is effectively zero (flat) at each time point. The probability of a negative slope can be derived as 1 - slpSgnZeroPr - slpSgnPosPr.

  • pos_ncp, neg_ncp, pos_ncpPr, neg_ncpPr, pos_cpOccPr, neg_cpOccPr, pos_cp, neg_cp, pos_cpPr, neg_cpPr, pos_cpAbruptChange, neg_cpAbruptChange, pos_cpCI, neg_cpCI: As above, but restricted to trend changepoints with positive jumps (pos_*) or negative jumps (neg_*) in the trend. For example, pos_ncp is the mean number of changepoints at which the trend level jumps up.

  • inc_ncp, dec_ncp, inc_ncpPr, dec_ncpPr, inc_cpOccPr, dec_cpOccPr, inc_cp, dec_cp, inc_cpPr, dec_cpPr, inc_cpAbruptChange, dec_cpAbruptChange, inc_cpCI, dec_cpCI: As above, but restricted to changepoints where the trend slope increases (inc_*) or decreases (dec_*). For example, if the trend slope changes from 0.4 to 2.5 at a changepoint, that changepoint is counted toward inc_ncp.

season

season is a list analogous to trend, but for the seasonal/periodic component (when present):

  • ncp: Mean number of seasonal changepoints. Because individual models sampled by BEAST can have varying numbers of changepoints, several alternative summaries (ncp_mode, ncp_median, ncp_pct90) are also provided. For example, if mcmc$samples = 10 and the numbers of changepoints across models are c(0, 2, 4, 1, 1, 2, 7, 6, 6, 1), then the mean number of changepoints (ncp) is 3.1, the median 2.5, the mode 1, and the 90th percentile (ncp_pct90) 6.5.

  • ncp_mode: Mode of the number of seasonal changepoints. See the above for explanations.

  • ncp_median: Median of the number of seasonal changepoints. See the above for explanations.

  • ncp_pct90: 90th percentile of the number of seasonal changepoints. See the above for explanations.

  • ncpPr: Numeric vector of length prior$seasonMaxKnotNum + 1; probability distribution over the number of seasonal changepoints in 0:prior$seasonMaxKnotNum. For example, ncpPr[1] is the probability of having no seasonal changepoint; ncpPr[i] is the probability of having i - 1 changepoints.

  • cpOccPr: Numeric vector of length N; seasonal changepoint occurrence probability over time. Plotting cpOccPr yields a continuous probability-of-being-a-changepoint curve. A peak with a high value at a single time step does not necessarily imply a higher overall probability of a changepoint in a neighborhood, compared to a broader peak with a lower maximum but higher total probability. Specifically, a higher peak indicates a higher chance of being a changepoint only at that particular SINGLE point in time and does not necessarily mean a higher chance of observing a changepoint AROUND that time. For example, a window of cpOccPr values c(0,0,0.5,0,0) (i.e., the peak prob is 0.5 and the summed probability over the window is 0.5) is less likely to be a changepoint compared to another window c(0.1,0.2,0.21,0.2,0.1) where the peak of 0.21 is lower but the summed probablity of 0.71 is larger.

  • order: Numeric vector of length N; average harmonic order needed to approximate the seasonal component. Because this is a posterior average, it need not be an integer

  • cp: Numeric vector (up to length prior$seasonMaxKnotNum); estimated seasonal changepoint locations. These are obtained by smoothing cpOccPr with a window of size prior$seasonMinSepDist and then selecting up to prior$seasonMaxKnotNum peaks in the smoothed curve. Some entries may be NaN if the number of detected changepoints is fewer than seasonMaxKnotNum. Not all listed changepoints should be treated as "true" changepoints; some may be false positives. Check the associated posterior probabiliites (given in the next field season$cpPr for the confidence levels.

  • cpPr: Numeric vector of length prior$seasonMaxKnotNum; the posterior probabilities associated with each changepoint in cp. Trailing entries are NaN if fewer than prior$seasonMaxKnotNum changepoints are identified.

  • cpCI: Numeric matrix of dimension prior$seasonMaxKnotNum x 2; credible intervals for seasonal changepoint locations.

  • cpAbruptChange: Numeric vector of length prior$seasonMaxKnotNum; magnitude of jumps in the seasonal curve at each changepoint.

  • Y: Numeric vector of length N; estimated seasonal component (Bayesian model average).

  • SD: Numeric vector of length N; posterior standard deviation of the seasonal component.

  • CI: Numeric matrix of dimension N x 2; credible interval for the seasonal component.

  • amp: Numeric vector of length N; time-varying amplitude of the seasonality.

  • ampSD: Numeric vector of length N; posterior standard deviation of the amplitude.

  • pos_ncp, neg_ncp, pos_ncpPr, neg_ncpPr, pos_cpOccPr, neg_cpOccPr, pos_cp, neg_cp, pos_cpPr, neg_cpPr, pos_cpAbruptChange, neg_cpAbruptChange, pos_cpCI, neg_cpCI: Seasonal analogues of the trend outputs with the same names, but restricted to seasonal changepoints with positive (pos_*) or negative (neg_*) jumps in the seasonal component. For example, pos_ncp refers to the average number of seasonal changepoints that jump up (i.e., positively) in the seasonal curve.

outlier

outlier is a list analogous to trend or season, but for the outlier component ( present only if setting hasOutlier=TRUE)

References

  1. Zhao, K., Wulder, M.A., Hu, T., Bright, R., Wu, Q., Qin, H., Li, Y., Toman, E., Mallick, B., Zhang, X. and Brown, M., 2019. Detecting change-point, trend, and seasonality in satellite time series data to track abrupt changes and nonlinear dynamics: A Bayesian ensemble algorithm. Remote Sensing of Environment, 232, p.111181 (the beast algorithm paper).

  2. Zhao, K., Valle, D., Popescu, S., Zhang, X. and Mallick, B., 2013. Hyperspectral remote sensing of plant biochemistry using Bayesian model averaging with variable and band selection. Remote Sensing of Environment, 132, pp.102-119 (the Bayesian MCMC scheme used in beast).

  3. Hu, T., Toman, E.M., Chen, G., Shao, G., Zhou, Y., Li, Y., Zhao, K. and Feng, Y., 2021. Mapping fine-scale human disturbances in a working landscape with Landsat time series on Google Earth Engine. ISPRS Journal of Photogrammetry and Remote Sensing, 176, pp.250-261(a beast application paper).

See Also

beast, beast.irreg, minesweeper, tetris, geeLandsat

Examples


#-------------------------------- NOTE ----------------------------------------------#
# beast123() is an all-inclusive function that unifies the functionality of beast() 
# and beast.irreg(). It can handle single, multiple, or 3D stacked time series, either
# regular or irregular. It allows customization via four list arguments:
#   metadata -- additional information about the input Y
#   prior    -- prior parameters for the BEAST model
#   mcmc     -- MCMC settings
#   extra    -- miscellaneous options controlling output and parallel computation
#
# Although functionally similar to beast() and beast.irreg(), beast123() is designed
# to efficiently handle multiple time series concurrently (e.g., stacked satellite
# images) via internal multithreading. When processing stacks of raster layers, 
# DO NOT iterate pixel-by-pixel using beast() or beast.irreg() wrapped by external
# parallel tools (e.g., doParallel, foreach). Instead, use beast123(), which
# implements parallelism internally.
 
#------------------------------ Example 1: seasonal time series ----------------------#
# Yellowstone is a half-monthly NDVI time series of length 774 for a site in Yellowstone
# starting from July 1-15, 1981 (start ~ c(1981, 7, 7); 24 observations per year).

library(Rbeast)
data(Yellowstone)
plot(Yellowstone)

# Below, the four optional arguments are omitted, so default values are used.
# By default, Y is assumed to be regular with a seasonal component. The actual
# default values are printed at the start of the run and can be used as templates
# for customization.

  o <- beast123(Yellowstone)
  plot(o)


#------------------------------ Example 2: trend-only series -------------------------#
# Nile is an annual river flow time series (no periodic variation). Set season = "none"
# to indicate trend-only analysis. Default values are used for other options.
# Unlike beast(), beast123() does NOT use the time attributes of a 'ts' object.
# For example, Nile is treated as numeric data only with the default times 
# 1:length(Nile); its (start=1871, end=1970, freq=1) attributes are ignored 
# unless specified through 'metadata'.


  o <- beast123(Nile, season = "none")
  o <- beast123(Nile, extra = list(dumpInputData = TRUE, quiet = TRUE),
                season = "none")
  o <- beast123(Nile, metadata = list(startTime = 1871, hasOutlier = TRUE),
                season = "none")
  plot(o)


#------------------------------ Example 3: full API usage ---------------------------#
# Explicitly specify metadata, prior, mcmc, and extra. Among these, 'prior' contains
# the true statistical model parameters; the others configure input, output, and
# computation.

## Not run: 

  # metadata: extra information describing the input time series Y.
  metadata <- list()
  # metadata$isRegularOrdered <- TRUE    # no longer used in this version
  metadata$whichDimIsTime      <- 1      # which dimension is time for 2D/3D inputs
  metadata$startTime           <- c(1981, 7, 7)
  # alternatively: metadata$startTime <- 1981.5137
  #                metadata$startTime <- as.Date("1981-07-07")
  metadata$deltaTime           <- 1/24   # half-monthly: 24 obs/year
  metadata$period              <- 1.0    # 1-year period: 24 * (1/24) = 1
  metadata$omissionValue       <- NaN    # NaNs are omitted by default
  metadata$maxMissingRateAllowed <- 0.75 # skip series with > 75
  metadata$deseasonalize       <- FALSE  # do not pre-remove global seasonality
  metadata$detrend             <- FALSE  # do not pre-remove global trend
  
  # prior: only true model parameters specifying the Bayesian formulation
  prior <- list()
  prior$seasonMinOrder    <- 1     # min harmonic order allowed to fit seasonal cmpnt
  prior$seasonMaxOrder    <- 5     # max harmonic order allowed to fit seasonal cmpnt
  prior$seasonMinKnotNum  <- 0     # min number of changepnts in seasonal cmpnt
  prior$seasonMaxKnotNum  <- 3     # max number of changepnts in seasonal cmpnt
  prior$seasonMinSepDist  <- 10    # min inter-chngpts separation for seasonal cmpnt
  prior$trendMinOrder     <- 0     # min polynomial order allowed to fit trend cmpnt 
  prior$trendMaxOrder     <- 1     # max polynomial order allowed to fit trend cmpnt   
  prior$trendMinKnotNum   <- 0     # min number of changepnts in trend cmpnt
  prior$trendMaxKnotNum   <- 15    # max number of changepnts in trend cmpnt 
  prior$trendMinSepDist   <- 5     # min inter-chngpts separation for trend cmpnt 
  prior$precValue         <- 10.0  # Initial value of precision parameter(no need
                                   # to change it unless for precPrioType='const')
  prior$precPriorType     <- "uniform"  # "constant", "uniform", "componentwise", "orderwise"
  
  # mcmc: settings for the MCMC sampler
  mcmc <- list()
  mcmc$seed                      <- 9543434  # arbitray seed for random number generator
  mcmc$samples                   <- 3000     # samples collected per chain
  mcmc$thinningFactor            <- 3        # take every 3rd sample and discard others 
  mcmc$burnin                    <- 150      # discard the initial 150 samples per chain
  mcmc$chainNumber               <- 3        # number of chains 
  mcmc$maxMoveStepSize           <- 4        # max random jump step when proposing new chngpts
  mcmc$trendResamplingOrderProb  <- 0.10     # prob of choosing to resample polynomial order
  mcmc$seasonResamplingOrderProb <- 0.10     # prob of choosing to resample harmonic order
  mcmc$credIntervalAlphaLevel    <- 0.95     # the significance level for credible interval 
  
  # extra: output and computation options
  extra <- list()
  extra$dumpInputData         <- FALSE  # If true, a copy of input time series is outputted 
  extra$whichOutputDimIsTime  <- 1      # For 2D or 3D inputs, which dim of the output refers
                                        # to time? Ignored if the input is a single time series
  extra$computeCredible       <- FALSE  # If true, compute CI: computing CI is time-intensive. 
  extra$fastCIComputation     <- TRUE   # If true, a faster way is used to get CI, but it is 
                                        # still time-intensive. That is why the function beast()
                                        # is slow because it always compute CI.
  extra$computeSeasonOrder    <- FALSE  # If true, dump the estimated harmonic order over time
  extra$computeTrendOrder     <- FALSE  # If true, dump the estimated polynomial order over time
  extra$computeSeasonChngpt   <- TRUE   # If true, get the most likely locations of s chgnpts
  extra$computeTrendChngpt    <- TRUE   # If true, get the most likely locations of t chgnpts
  extra$computeSeasonAmp      <- FALSE  # If true, get time-varying amplitude of seasonality
  extra$computeTrendSlope     <- FALSE  # If true, get time-varying slope of trend
  extra$tallyPosNegSeasonJump <- FALSE  # If true, get those changpts with +/- jumps in season
  extra$tallyPosNegTrendJump  <- FALSE  # If true, get those changpts with +/- jumps in trend 
  extra$tallyIncDecTrendJump  <- FALSE  # If true, get those changpts with increasing/
                                        # decreasing trend slopes 
  extra$printProgress         <- TRUE
  extra$printParameter        <- TRUE
  extra$quiet                 <- FALSE  # if TRUE, print warning messages, if any
  extra$consoleWidth          <- 0      # If 0, the console width is from the current console
  extra$numThreadsPerCPU      <- 2      # 'numThreadsPerCPU' and 'numParThreads' are used to 
  extra$numParThreads         <- 0      #   configure multithreading runs; they're used only 
                                        #   if Y has multiple time series (e.g.,stacked images)
  
  o <- beast123(Yellowstone, metadata, prior, mcmc, extra, season = "harmonic")
  plot(o)

## End(Not run)

#------------------------------ Example 4: irregular time series ---------------------#
# ohio is a data frame of Landsat NDVI observations at unevenly spaced times.


## Not run: 

  data(ohio)
  str(ohio)
  
  ###################################################################################
  # Individual times are provided as fractional years via ohio$time
  
  metadata <- list()
  metadata$time      <- ohio$time  # Must supply individual times for irregular inputs
  metadata$deltaTime <- 1/12       # Must supply the desired time interval for aggregation
  metadata$period    <- 1.0
  o <- beast123(ohio$ndvi, metadata)   # defaults used for missing parameters
  
  ###################################################################################
  # Another accepted time format: Ohio dates as Date objects
  class(ohio$rdate)
  
  metadata <- list()
  metadata$time      <- ohio$rdate # Must supply individual times for irregular inputs
  metadata$deltaTime <- 1/12       # Must supply the desired time interval for aggregation
  o <- beast123(ohio$ndvi, metadata)  # Default values used for those missing parameters
  
  ###################################################################################
  # Another accepted time format: separate Y, M, D columns
  ohio$Y      # Year                       
  ohio$M      # Month
  ohio$D      # Day
  
  metadata <- list()
  metadata$time$year   <- ohio$Y   # Must supply individual times for irregular inputs
  metadata$time$month  <- ohio$M
  metadata$time$day    <- ohio$D
  metadata$deltaTime   <- 1/12     # Must supply the desired time interval for aggregation
  o <- beast123(ohio$ndvi, metadata) # Default values used for those missing parameters
  
  ###################################################################################
  # Another accepted time format: year and DOY
  ohio$Y        # Year
  ohio$doy      # Day of year
  
  metadata <- list()
  metadata$time$year  <- ohio$Y   # Must supply individual times for irregular inputs 
  metadata$time$doy   <- ohio$doy
  metadata$deltaTime  <- 1/12     # Must supply the desired time interval for aggregation
  o <- beast123(ohio$ndvi, metadata) # Default values used for those missing parameters
  
  
  ###################################################################################
  # Another accepted time format: date strings with custom strfmt
  ohio$datestr1
  
  metadata <- list()
  metadata$time$datestr  <- ohio$datestr1
  metadata$time$strfmt   <- "????yyyy?mm?dd"
  metadata$deltaTime     <- 1/12
  o <- beast123(ohio$ndvi, metadata)
  
 ###################################################################################
 # Another accepted time format: date strings with custom strfmt
  ohio$datestr2
  
  metadata <- list()
  metadata$deltaTime     <- 1/12
  metadata$time$datestr  <- ohio$datestr2
  metadata$time$strfmt   <- "????yyyydoy????"
  o <- beast123(ohio$ndvi, metadata)
  
 ###################################################################################
 # Another accepted time format: date strings with custom strfmt
  ohio$datestr3
  
  metadata <- list()
  metadata$deltaTime     <- 1/12
  metadata$time$datestr  <- ohio$datestr3
  metadata$time$strfmt   <- "Y,,M/D"
  o <- beast123(ohio$ndvi, metadata)

## End(Not run)


#------------------ Example 5: multiple time series (matrix input) -------------------#
# simdata is a 2D matrix of dimension 300 x 3, representing 3 time series of length 300.

## Not run: 
  data(simdata)  # dim of simdata: 300 x 3 (time x num_of_time_series) 
  dim(simdata)   # the first dimenion refer to time (i.e, 300)
  
  metadata <- list()
  metadata$whichDimIsTime <- 1  # first dimension is time
  metadata$period         <- 24 # assume startTime = 1, deltaTime = 1 by default
  
  extra <- list()
  extra$whichOutputDimIsTime <- 2  # Which dim of the output arrays refers to  time?
  
  o <- beast123(simdata, metadata, extra = extra)
  
  # Lists of inputs args can also be provided inline:
  o <- beast123(simdata,
                metadata = list(whichDimIsTime = 1, period = 24),
                extra    = list(whichOutputDimIsTime = 2))
  
  # Field names can be shortened, provided no ambiguity arises:
  o <- beast123(simdata,
                metadata = list(whichDim = 1, per = 24),
                extra    = list(whichOut = 2))
  
  #------------------ Example 5b: transposed simdata ---------------------------------#
  simdata1 <- t(simdata)   # 3 (series) x 300 (time)
                           # dim of simdata1: 3 x 300 (num of ts  x time )
						   
  metadata <- list()
  metadata$whichDimIsTime <- 2  # time is second dimension
  metadata$period         <- 24 # assume startTime = 1, deltaTime = 1 by default
  o <- beast123(simdata1, metadata)
  
  o <- beast123(simdata1, metadata = list(whichDim = 2, per = 24))

## End(Not run)

#------------------ Example 6: 3D stacked image time series -------------------------#
# imagestack$ndvi is a 3D array of size 12 x 9 x 1066 (row x col x time). Each pixel
# corresponds to a time series of length 1066.

## Not run: 
  data(imagestack)
  dim(imagestack$ndvi)   # 12 x 9 x 1066
  imagestack$datestr     # A character vector of 1066 date strings
  
  metadata <- list()
  metadata$whichDimIsTime <- 3  # Which dim of the input refer to time for 3D inputs?
                                # 1066 is the ts length, so dim is set to '3' here.
                                # In this example, this arg is not needed because 
                                # the time$datestr can also help to match and pick up
                                # the right time dimesion of imagestack$ndvi.
  metadata$time$datestr   <- imagestack$datestr
  metadata$time$strfmt    <- "LT05_018032_20080311.yyyy-mm-dd"
  metadata$deltaTime      <- 1/12   # aggregate to monthly (i.e., 1/12 yr)
  metadata$period         <- 1.0    # 1-year period
  
  extra <- list()
  extra$dumpInputData    <- TRUE   # Dump a copy of aggregated input ts 
  extra$numThreadsPerCPU <- 2      # Each cpu core will be assigned 2 threads
  extra$numParThreads    <- 0      # If 0, total_num_threads=numThreadsPerCPU*num_of_cpu_core
                                   # if >0, used to specify the total number of threads
  
  # Default values for missing parameters 
  o <- beast123(imagestack$ndvi, metadata = metadata, extra = extra)
  
  print(o, c(5, 3))     # result for pixel at row 5, col 3
  plot(o,  c(5, 3))     # plot result for pixel at row 5, col 3
  image(o$trend$ncp)    # map number of trend changepoints over space

## End(Not run)

#------------------ Example 7: GeoTIFF stacks via raster package --------------------#
# Handle 3D stacked images from a series of NDVI GeoTIFF files (e.g., ndvi.zip with
# 437 NDVI TIFF files of dimension 12 x 9). Example code is available at:
#   https://github.com/zhaokg/Rbeast/blob/master/R/beast123_raster_example.txt


Rbeast documentation built on Dec. 19, 2025, 9:06 a.m.