# beast123: Bayesian time series decomposition for changepoint, trend,... In Rbeast: Bayesian Change-Point Detection and Time Series Decomposition

 beast123 R Documentation

## Bayesian time series decomposition for changepoint, trend, and periodicity or seasonality

### Description

A Bayesian model averaging algorithm called BEAST to decompose time series or 1D sequential data into individual components, such as abrupt changes, trends, and periodic/seasonal variations. BEAST is useful for changepoint detection (e.g., breakpoints or structural breaks), nonlinear trend analysis, time series decomposition, and time series segmentation.

### Usage

``````    beast123( Y,
prior   =list(),
mcmc    =list(),
extra   =list(),
season  =c('harmonic','dummy','none'),
...)
``````

### Arguments

 `Y` a 1D vector, 2D matrix, or 3D array of numeric data. Missing values are allowed and can be indicated by `NA`, `NaN`, or a value customized in the 2nd argument `metadata` (e.g., `metadata\$missingValue=-9999`). If `Y` is a vector of size `Nx1` or `1xN`, it is treated as a single time series of length `N`. If `Y` is a 2D matrix or 3D array of dimension `N1xN2` or `N1xN2xN3` (e.g., stacked images of geospatial data), it includes multiple time series of equal length: Which dimension is time has to be specified in the 2nd argument using `metadata\$whichDimIsTime`. For example, `metadata\$whichDimIsTime` = 1 for a 190x35 2D input indicates 35 time series of length 190 each; `metadata\$whichDimIsTime` = 2 for a 100x200x300 3D input indicates 30000=100*300 time series of length 200 each. `Y` can be either regular (i.e., evenly-spaced in time) or irregular/unordered in time. If regular, individual times are determined from the time of the 1st data point `startTime` and the time span between consecutive points `deltaTime`, which are specified in the 2nd arg through `metadata\$startTime and metadata\$deltaTime`; if not given, `startTime` and `deltaTime` take a default 1.0. If irregular or regular but unordered, the times have to be explicitly given through `metadata\$time`. The BEAST model is currently formulated for regular data only, so internally, the `beast123` function will aggregate/re-bin irregular data into regular ones; for the aggregation, the `metadata\$deltaTime` parameter should also be also provided to specify the desired bin size or time interval. `Y` can have a periodic component or have a trend component only. Use the argument `season` to specify the cases. If `season='none'`, `Y` is treated as trend-only; no periodic components are present in the time series. If `season='harmonic'`, `Y` has a periodic/seasonal component. The term 'season' is a misnomer being used here to broad refer to any periodic variations present in `Y`. The periodicity is not a statistical parameter estimated by BEAST but a known constant given by the user through `metadata\$freq`. The periodic component is modeled as a harmonic curve–a combination of sins and cosines. If `season='dummy'`, the same as 'harmonic' except that the periodic/seasonal component is modeled as a non-parametric curve. `metadata` (optional). If present, `metadata` may (1) a scalar value to specify the period of the input Y, (2) a vector of numbers, strings, or R Dates to specify the times of Y, or (3) more often, a LIST object specifying various parameters to describe the 1st argument `Y`. If missing, default values will be used. But `metadata` should be explicitly provided if the input `Y` is a 2D matrix or 3D array to avoid mis-interpreting the input Y. `metadata` is not part of BEAST's Bayesian formulation but just some additional info to interpret `Y`. If `metadata` is provided as a LIST, below are possible fields; not all of them are always needed, depending on the types of inputs (e.g., 1D, 2D or 3D; regular or irregular). `metadata\$whichDimIsTime`: integer (<=3). Needed to specify which dimension of `Y` is time for a matrix or 3D array input. Ignored if the input `Y` is a vector. `metadata\$isRegularOrdered`: logical. Obsolete and no longer used in this version. Now, `metadata\$time` is analyzed to determine whether the input is irregular or not; if `metadata\$time` is missing, `Y` is assumsed to be regular. `metadata\$time`: a vector of the same length as `Y`'s time dimension to provide the times for datapoints. It can be a vector of numbers, Dates, or date strings; it can also be a list of vectors of year, months, and days. Possible formats include: a vector of numerical values [e.g., c(1984.23, 1984.27, 1984.36, ...)]. The unit of the times is irrelevant to BEAST as long as it is consistent with the unit used for specifying `startTime`, `deltaTime`, and `period`. a vector of R Dates [e.g., `as.Date`( c("1984-03-27", "1984-04-10", "1984-05-12",... )]. a vector of char strings. Examples are: c("1984-03-27", "1984-04-10", "1984-05-12") c("1984/03/27", "1984,04,10", "1984 05 12") (i.e., the delimiters differ as long as the YMD order is consistent) c("LT4-1984-03-27", "LT4-1984-04-10", "LT4-1984+05,12") c("LT4-1984087ndvi", "LT4-1984101ndvi", "LT4-1984133ndvi") c("1984,,abc 3/ 27", "1984,,ddxfdd 4/ 10" "ggd1984,, 5/ ttt 12") BEAST uses several heuristics to automatically parse the date strings without a format specifier but may fail due to ambiguity (e.g., in "LC8-2020-09-20-1984", no way to tell if 2020 or 1984 is the year). To ensure correctness, use a list object as explained below to provide a date format specifier. a list object `time=list(datestr=..., strfmat='...')` consisting of a vector of date strings (`time\$datestr`) and a format specifier (`time\$strFmt`). The string `time\$strFmt` specifies how to parse `dateStr`. Three formats are currently supported: (a). All the date strings have a fixed pattern in terms of the relative positions of Year, Month, and Day. For example, to extract 2001/12/02 etc from `time\$dateStr` = c('P23R34-2001.1202333xd', 'O93X94-2002.1108133fd', 'TP3R34-2009.0122333td') use `time\$strFmt`=`'P23R34-yyyy.mmdd333xd'` where `yyyy`, `mm`, and `dd` are the specifiers and other positions are wildcards and can be filled with any other letters different from `yyyy, mm and dd`. (b). All the date strings have a fixed pattern in terms of the relative positions of year and doy. For example, to extract 2001/045(day of year) from 'P23R342001888045', use strFmt='123123yyyy888doy' where `yyyy` and `doy` are the specifiers and other positions are wildcards and can be filled with any other letters different from yyyy, and doy. 'doy' must be three digit in length. (c). All the date strings have a fixed pattern in terms of the separation characters between year, month, and day. For example, to extract 2002/12/02 from '2002,12/02', ' 2002 , 12/2', '2002,12 /02 ', use strFmt='Y,M/D' where the whitespaces are ignored. To get 2002/12/02 from '2–12, 2012 ', use strmFmt='D–M,Y'. a list object of vectors to specify individual dates of the time series. Use `time\$year`,`time\$month`,and `time\$day` to give the dates; or alternatively use `time\$year` and `time\$doy` where each value of the `doy` vector is a number within 1 and 365/366. Each vector must have the same length as the time dimension of `Y`. `metadata\$startTime`: numeric (default to 1.0 if missing). It gives the time of the 1st data point. It can be specified as a scalar (e.g., 2021.23) or a vector of three values in the order of year, month, and day (e.g., `metadata\$startTime` = `c(2021,1,24)`). `metadata\$startTime` is needed for regular input data but optional for irregular data: If missing, startTime will be computed from `metadata\$time` for irregular `Y`. `metadata\$deltaTime`: numeric or string. It specifies the time interval between consecutive data points. It is optional for regular data (default to 1.0 if not supplied), but should be specified for irregular data because deltaTime is needed to aggregate/resample the irregular time series into regular ones. The unit of `deltaTime` needs to be consistent with `metadata\$time`. If `metadata\$time` takes a numeric vector, the unit of `deltaTime` is arbitrary and irrelevant to BEAST. If `time` takes a vector of Dates or date strings, the unit for `deltaTime` is assumed to Fractional YEAR. If needed, use a string instead of a number to specify whether the unit of deltaTime is day, month, or year. Examples include '7 days', '7d', '1/2 months', '1mn', '1.0 year', and '1y'. `metadata\$period`: numeric or string. Specify the period for the periodic/seasonal component in `Y`. Needed only for data with a periodic/cyclic component (i.e., season=`'harmonic'` or `'dummy'`) and not used for trend-only data (i.e., `season='none'`). The period of the cyclic component should have a unit consisent with the unit of `deltaTime`. It holds that `period=deltaTime*freq` where `freq` is the number of data samples per period. (Note that the `freq` argument in earlier versions becomes obsolete and now is replaced by `period`.) `period` or the number of data points per period is not a BEAST model parameter and it has to be specified by the user. But if `period` is missing, BEAST first attempts to guess its value via auto-correlation before fitting the model. If `period` <= 0, no seasonal/cyclic component is assumed (i.e, `season='none'`) and the trend-only model is used. If needed, use a string to specify whether the unit of period is day, month, or year. Examples are '1.0 year', '12 months', '365d', '366 days'. `metadata\$missingValue`: numeric; a customized value to indicate bad/missing values in the time series, in addition to those NA or NaN values. `metadata\$maxMissingRate` a fractional number within [0, 1] as the maximum percentage of missing values, above which the time series will be skipped and won't be fitted by BEAST. `prior` (optional). a list object consisting of the hyperprior parameters in the Bayesian formulation of the BEAST model. Because they are part of the model, the fitting result may be sensitive to the choices of these hyperparameters. If `prior` is missing, a set of default values will be used and the exact values used will be printed to the console at the start of the BEAST run. Below are possible parameters: `prior\$seasonMinOrder`: integer (>=1) `prior\$seasonMaxOrder`: integer (>=1); the min and max harmonic orders considered to fit the seasonal component. `seasonMinOrder` and `seasonMaxOrder` are only used if the time series has a seasonal component (i.e., `season='harmonic'`) and ignored for trend-only data or when `season='dummy'`. If `seasonMinOrder`=`seasonMaxOrder`, BEAST assumes a constant harmonic order used and won't infer the posterior probability of harmonic orders. `prior\$seasonMinKnotNum`: integer (>=0) `prior\$seasonMaxKnotNum`: integer (>=0); the min and max number of seasonal changepoints allowed in segmenting and fitting the seasonal component. `seasonMinKnotNum` and `seasonMaxKnotNum` are only used if the time series has a seasonal component (i.e., `season='harmonic'` or `season='dummy'`) and ignored for trend-only data. If `seasonMinOrder`=`seasonMaxOrder`, BEAST assumes a constant number of changepoints and won't infer the posterior probability of the number of changepoints, but it will still estimate the occurrence probability of the changepoints over time (i.e., the most likely times at which these changepoints occur). If `seasonMinOrder`=`seasonMaxOrder`=0, no changepoints are allowed in the seasonal component; then a global harmonic model is used to fit the seasonal component. `prior\$seasonMinSepDist`: integer (>0). the min separation time between two neighboring season changepoints. That is, when fitting a piecewise harmonic seasonal model, no two changepoints are allowed to occur within a time window of seasonMinSepDist. `seasonMinSepDist` must be an unitless integer–the number of time intervals/data points so that the time window in the original unit is seasonMinSepDist*metadata\$deltaTime. `prior\$trendMinOrder`: integer (>=0) `prior\$trendMaxOrder`: integer (>=0); the min and max orders of the polynomials considered to fit the trend component. The zero-th order corresponds to a constant term/ a flat line and the 1st order is a line. If `trendMinOrder`=`trendMaxOrder`, BEAST assumes a constant polynomial order used and won't infer the posterior probability of polynomial orders. `prior\$trendMinKnotNum`: `prior\$trendMaxKnotNum`: integer (>=0); the min and max number of trend changepoints allowed in segmenting and fitting the trend component. If `trendMinOrder`=`trendMaxOrder`, BEAST assumes a constant number of changepoints in the fitted trend and won't infer the posterior probability of the number of trend changepoints, but it will still estimate the occurrence probability of the changepoints over time (i.e., the most likely times at which these changepoints occur). If `trendMinOrder`=`trendMaxOrder`=0, no changepoints are allowed in the trend component; then a global polynomial model is used to fit the trend. `prior\$trendMinSepDist`: integer (>0). the min separation time between two neighboring trend changepoints. `prior\$precValue`: numeric (>0); the default value is 10. `prior\$precPriorType`: characters. It takes one of 'constant', 'uniform' (the default), 'componentwise', and 'orderwise'. Below are the differences between them. `precPriorType='constant'`: the precision parameter used to parameterize the model coefficients is fixed to a constant specified by `prior\$precValue`. In other words, `prior\$precValue` is a user-defined hyperparameter and the fitting result may be sensitive to the chosen values of `prior\$precValue`. `precPriorType='uniform'`: the precision parameter used to parameterize the model coefficients is a random variable; its initial value is specified by `prior\$precValue`. In other words, `precValue` will be inferred by the MCMC, so the fitting result is insensitive to the choice in `prior\$precValue`. `precPriorType='componentwise'`: multiple precision parameters are used to parameterize the model coefficients for individual components (e.g., one for season and another for trend); their initial values is specified by `prior\$precValue`. In other words, `precValue` will be inferred by the MCMC, so the fitting result is insensitive to the choice in `prior\$precValue`. `precPriorType='orderwise'`: multiple precision parameters are used to parameterize the model coefficients not just for individual components but also for individual orders of each component; their initial values is specified by `prior\$precValue`. In other words, `precValue` will be inferred by the MCMC, so the fitting result is insensitive to the choice in `prior\$precValue`. `mcmc` (optional). a list object consisting of parameters to configure the MCMC inference. These parameter are not part of the Bayesian formulation of the BEAST model but are the settings for the reversible-jump MCMC to generate MCMC chains. Due to the MCMC nature, the longer the simulation chain is, the better the fitting result. Below are possible parameters: `mcmc\$seed`: integer (>=0); the seed for the random number generator. If `mcmc\$seed=0`, an arbitrary seed will be picked up and the fitting result will var across runs. If fixed to the same on-zero integer, the results can be re-produced for different runs. Note that the results may still vary if run on different computers with the same seed because the random generator library depends on CPU's instruction sets. `mcmc\$samples`: integer (>0); the number of samples collected per MCMC chain. `mcmc\$chainNumber`: integer (>0); the number of parallel MCMC chains. `mcmc\$thinningFactor`: integer (>0); a factor to thin chains (e.g., if thinningFactor=5, samples will be taken every 3 iterations). `mcmc\$burnin`: integer (>0); the number of burn-in samples discarded at the start of each chain. `mcmc\$maxMoveStepSize`: integer (>0). The RJMCMC sampler employs a move proposal when traversing the model space or proposing new positions of changepoints. 'maxMoveStepSize' is used in the move proposal to specify the max window allowed in jumping from the current changepoint. `mcmc\$seasonResamplingOrderProb`: a fractional number less than 1.0; the probability of selecting a re-sampling proposal (e.g., resample seasonal harmonic order). `mcmc\$trendResamplingOrderProb`: a fractional number less than 1.0; the probability of selecting a re-sampling proposal (e.g., resample trend polynomial order) `mcmc\$credIntervalAlphaLevel`: a fractional number less than 1.0 (default to 0.95); the level of confidence used to compute credible intervals. `extra` (optional). a list object consisting of flags to control the outputs from the BEAST runs or configure other program setting. Below are possible parameters: `extra\$quiet`: logical (default to FALSE). If TRUE, no warning messages will be printed out. `extra\$dumpInputData`: logical (default to FALSE). If TRUE, the input time series will be copied into the output. When the input `Y` is irregular (i.e., `metadata\$isRegularOrdered=FALSE`), the dumped copies will be the aggregated regular time series. `extra\$whichOutputDimIsTime`: integer (<=3). If the input `Y` is a 2D or 3D array (i.e., multiple time series such as stacked images), the `whichOutputDimIsTime` specifies which dimension is the time in the output variables. `whichOutputDimIsTime` defaults to 3 for 3D inputs and is ignored if the input is a vector (i.e., a single time series). `extra\$ncpStatMethod`: character (deprecated). A string to specify which statistic is used to determine the Number of ChangePoint (ncp) when computing the most likely changepoint locations (e.g., out\$trend\$cp, and out\$season\$cp). Three values are possible: 'mode', 'mean', and 'median'; the default is 'mode'. Individual models sampled by BEAST has a varying dimension (e.g., number of changepoints or knots). For example, if mcmc\$samples=10, the numbers of changepoints for the 10 sampled models are assumed to be c(0, 2, 4, 1, 1, 2, 7, 6, 6, 1). The mean ncp is 3.1 (rounded to 3), the median is 2.5 (2), and the mode is 1. This argument is deprecated; now all the possible changepoints are outputted, together with several versions of ncp, including `ncp`, `ncp_median`, `ncp_mode`, and `ncp_pct90`. A similar parameter `ncpStat` is added to the `plot.beast` function to specify which ncp is used when plotting. `extra\$computeCredible`: logical (default to TRUE). Credible intervals will be computed and outputted only if set to TRUE. `extra\$fastCIComputation`: logical (default to TRUE). If TRUE, a fast method is used to compute credible intervals (CI). Computation of CI is one of the most computational parts and fastCIComputation should be set to TRUE unless more accurate CI estimation is desired. `extra\$computeSeasonOrder`: logical (default to TRUE). If TRUE, a posterior estimate of the seasonal harmonic order will be outputted; this flag is only valid if the time series has a seasonal component (i.e., season='harmonic' and prior\$seasonMinOrder is not equal to prior\$seasonMaxOrder). `extra\$computeTrendOrder`: logical (default to TRUE). If TRUE, a posterior estimate of the tend polynomial order will be outputted; this flag is only valid when prior\$trendMinOrder is not equal to prior\$trendMaxOrder). `extra\$computeTrendOrder`: logical (default to TRUE). If TRUE, a posterior estimate of the tend polynomial order will be outputted; this flag is only valid when prior\$trendMinOrder is not equal to prior\$trendMaxOrder). `extra\$computeSeasonChngpt`: logical (default to TRUE). If TRUE, compute the most likely times/positions where changepoints occur in the seasonal component. This flag is not valid if there is a seasonal component in the time series (i.e., season='harmonic' or season='dummy' and prior\$seasonMaxKnotNum is non-zero). `extra\$computeTrendChngpt`: logical (default to TRUE). If TRUE, compute the most likely times/positions where changepoints occur in the trend component. `extra\$computeSeasonAmp`: logical (default to FALSE). If TRUE, compute and output the time-varying amplitude of the seasonality. `extra\$computeTrendSlope`: logical (default to FALSE). If TRUE, compute and output the time-varying slope of the estimated trend. `extra\$tallyPosNegSeasonJump`: logical (default to FALSE). If TRUE, compute and differentiate seasonal changepoints in terms of the direction of the jumps in the estimated seasonal signal. Those changepoints with a positive jump will be outputted separately from those with a negative jump. A series of output variables (some for positive-jump changepoints, and others for negative-jump changepoints will be dumped). `extra\$tallyPosNegTrendJump`: logical (default to FALSE). If TRUE, compute and differentiate trend changepoints in terms of the direction of the jumps in the estimated trend. Those changepoints with a positive jump will be outputted separately from those with a negative jump. A series of output variables (some for positive-jump changepoints, and others for negative-jump changepoints will be dumped). `extra\$tallyIncDecTrendJump`: logical (default to FALSE). If TRUE, compute and differentiate trend changepoints in terms of the direction of the jumps in the estimated slope of the trend signal. Those changepoints with a increase in the slope will be outputted separately from those with a decrease in the slope. A series of output variables (some for increase-jump changepoints, and others for decrease-jump changepoints will be dumped). `extra\$printProgressBar`: logical (default to FALSE). If TRUE, a progress bar will be displayed to show the status of the running. When running on multiple time series (e.g. stacked image time series), the progress bar will also report an estimate of the remaining time for completion. `extra\$consoleWidth`: integer (default to 0); the length of chars in each status line when setting printProgressBar=TRUE. If 0, the current width of the console will be used. `extra\$printOptions`: logical (default to FALSE). If TRUE, the values used in the arguments `metadata`, `prior`, `mcmc`, and `extra` will be printed to the console at the start of the run. `extra\$numThreadsPerCPU`: integer (default to 2); the number of threads to be scheduled for each CPU core. `extra\$numParThreads`: integer (default to 0). When handling many time series, BEAST can use multiple concurrent threads. `extra\$numParThreads` specifies how many concurrent threads will be used in total. If numParThreads=0, the actual number of threads will be `numThreadsPerCPU * cpuCoreNumber`; that is, each CPU core will generate a number 'numThreadsPerCPU' of threads. On Windows 64, ,BEAST is group-aware and will affine or distribute the threads to all the NUMA node. But currently, up to 256 CPU cores are supported. `season` characters (default to 'harmonic'); specify if `y` has a periodic component or not. Three strings are possible. `'none'`: `y` is trend-only; no periodic components are present in the time series. The args for the seasonal component (i.e.,`sorder.minmax`, `scp.minmax` and `sseg.max`) will be irrelevant and ignored. `'harmonic'`: `y` has a periodic/seasonal component. The term `season` is a misnomer, being used here to broadly refer to any periodic variations present in `y`. The periodicity is NOT a model parameter estimated by BEAST but a known constant given by the user through `freq`. By default, the periodic component is modeled as a harmonic curve–a combination of sins and cosines. If `'dummy'`, the same as `'harmonic'` except that the periodic/seasonal component is modeled as a non-parametric curve. The harmonic order arg `sorder.minmax` is irrelevant and is ignored. `...` additional parameters, not used currently but reserved for future extension

### Value

The output is an object of class "beast". It is a list, consisting of the following variables. Exact sizes of the variables depend on the types of the input `Y` as well as the specified output time dimension `extra\$whichOutputDimIsTime`. In the explanations below, we assume the input `Y` is a single time series of length `N`; the dimensions for 2D or 2D inputs may be interpreted accordingly:

 `time` a vector of size `1xN`: the times at the `N` sampled locations. By default, it is simply set to `1:N` `data` a vector, matrix, or 3D array; this is a copy of the input `Y` if `extra\$dumpInputData` = `TRUE`. If extra\$dumpInputData=FALSE, it is set to NULL. If the original input `Y` is irregular, the copy here is the regular version aggregated from the original at the time interval specified by metadata\$deltaTime. `marg_lik` numeric; the average of the model marginal likelihood; the larger marg_lik, the better the fitting for a given time series. `R2` numeric; the R-square of the model fitting. `RMSE` numeric; the RMSE of the model fitting. `sig2` numeric; the estimated variance of the model error. `trend` a list object numeric consisting of various outputs related to the estimated trend component: `ncp`: [Number of ChangePoints]. a numeric scalar; the mean number of trend changepoints. Individual models sampled by BEAST has a varying dimension (e.g., number of changepoints or knots), so several alternative statistics (e.g., `ncp_mode`, `ncp_median`, and `ncp_pct90`) are also given to summarize the number of changepoints. For example, if `mcmc\$samples=10`, the numbers of changepoints for the 10 sampled models are assumed to be c(0, 2, 4, 1, 1, 2, 7, 6, 6, 1). The mean ncp is 3.1 (rounded to 3), the median is 2.5 (2), the mode is 1, and the 90th percentile (ncp_pct90) is 6.5. `ncp_mode`: [Number of ChangePoints]. a numeric scalar; the mode for number of changepoints. See the above for explanations. `ncp_median`: [Number of ChangePoints]. a numeric scalar; the median for number of changepoints. See the above for explanations. `ncp_pct90`: [Number of ChangePoints]. a numeric scalar; the 90th percentile for number of changepoints. See the above for explanations. `ncpPr`: [Probability of the Number of ChangePoints]. A vector of length `(prior\$trendMaxKnotNum+1)`. It gives a probability distribution of having a certain number of trend changepoints over the range of [0,prior\$trendMaxKnotNum]; for example, `ncpPr` is the probability of having no trend changepoint; `ncpPr[i]` is the probability of having (i-1) changepoints: Note that it is `ncpPr[i]` not `ncpPr[i-1]` because ncpPr is used for having zero changepoint. `cpOccPr`: [ChangePoint OCCurence PRobability]. a vector of length N; it gives a probability distribution of having a changepoint in the trend at each point of time. Plotting `cpOccPr` will depict a continious curve of probability-of-being-changepoint. Of particular note, in the curve, a higher peak indicates a higher chance of being a changepoint only at that particular SINGLE point in time and does not necessarily mean a higher chance of observing a changepoint AROUND that time. For example, a window of cpOccPr values `c(0,0,0.5,0,0)` (i.e., the peak prob is 0.5 and the summed prob is 0.5) is less likely to be a changepoint compared to another window `c(0.1,0.2,0.21,0.2,0.1)` (i.e., the peak prob is 0.21 but the summed prob is 0.71). `order`: a vector of length N; the average polynomial order needed to approximate the fitted trend. As an average over many sampled individual piece-wise polynomial trends, `order` is not necessarily an integer. `cp`: [Changepoints] a vector of length `tcp.max=tcp.minmax`; the most possible changepoint locations in the trend component. The locations are obtained by first applying a sum-filtering to the `cpOccPr` curve with a filter window size of tseg.min and then picking up to a total `prior\$MaxKnotNum`/`tcp.max` of the highest peaks in the filtered curve. NaNs are possible if no enough changepoints are identified. `cp` records all the possible changepoints identified and many of them are bound to be false positives. Do not blindly treat all of them as actual changepoints. `cpPr`: [Changepoints PRobability] a vector of length `metadata\$trendMaxKnotNum`; the probabilities associated with the changepoints `cp`. Filled with NaNs for the remaining elements if `ncp
 `season` a list object numeric consisting of various outputs related to the estimated seasonal/periodic component: `ncp`: [Number of ChangePoints]. a numeric scalar; the mean number of seasonal changepoints. `ncpPr`: [Probability of the Number of ChangePoints]. A vector of length `(prior\$seasonMaxKnotNum+1)`. It gives a probability distribution of having a certain number of seasonal changepoints over the range of [0,prior\$seasonMaxKnotNum]; for example, `ncpPr` is the probability of having no seasonal changepoint; `ncpPr[i]` is the probability of having (i-1) changepoints: Note that the index is i rather than (i-1) because ncpPr is used for having zero changepoint. `cpOccPr`: [ChangePoint OCCurence PRobability]. a vector of length N; it gives a probability distribution of having a changepoint in the seasonal component at each point of time. Plotting `cpOccPr` will depict a continious curve of probability-of-being-changepoint over the time. Of particular note, in the curve, a higher value at a peak indicates a higher chance of being a changepoint only at that particular SINGLE point in time, and does not necessarily mean a higher chance of observing a changepoint AROUND that time. For example, a window of cpOccPr values `c(0,0,0.5,0,0)` (i.e., the peak prob is 0.5 and the summed prob is 0.5) is less likely to be a changepoint compared to another window values `c(0.1,0.2,0.3,0.2,0.1)` (i.e., the peak prob is 0.3 but the summed prob is 0.8). `order`: a vector of length N; the average harmonic order needed to approximate the seasonal component. As an average over many sampled individual piece-wise harmonic curves, `order` is not necessarily an integer. `cp`: [Changepoints] a vector of length `metadata\$seasonMaxKnotNum`; the most possible changepoint locations in the seasonal component. The locations are obtained by first applying a sum-filtering to the `cpOccPr` curve with a filter window size of prior\$trendMinSeptDist and then picking up to a total `ncp` of the highest peaks in the filtered curve. If `ncp

### References

1. Zhao, K., Wulder, M.A., Hu, T., Bright, R., Wu, Q., Qin, H., Li, Y., Toman, E., Mallick, B., Zhang, X. and Brown, M., 2019. Detecting change-point, trend, and seasonality in satellite time series data to track abrupt changes and nonlinear dynamics: A Bayesian ensemble algorithm. Remote Sensing of Environment, 232, p.111181 (the beast algorithm paper).

2. Zhao, K., Valle, D., Popescu, S., Zhang, X. and Mallick, B., 2013. Hyperspectral remote sensing of plant biochemistry using Bayesian model averaging with variable and band selection. Remote Sensing of Environment, 132, pp.102-119 (the Bayesian MCMC scheme used in beast).

3. Hu, T., Toman, E.M., Chen, G., Shao, G., Zhou, Y., Li, Y., Zhao, K. and Feng, Y., 2021. Mapping fine-scale human disturbances in a working landscape with Landsat time series on Google Earth Engine. ISPRS Journal of Photogrammetry and Remote Sensing, 176, pp.250-261(a beast application paper).

`beast`, `beast.irreg`, `minesweeper`, `tetris`, `geeLandsat`

### Examples

``````
#--------------------------------NOTE-------------------------------------------------#
# beast123() is an all-inclusive function that duplicates the functionalities of beast
# and beast.irreg. It can handle a single, multiple, or 3D of stacked time series, being
# either regular or irregular. It allows for customization through four LIST arguments:
#   prior    -- prior parameters for the beast model
#   mcmc     -- MCMC simulation setting
#   extra    -- misc parameters turning on/off outputs and setting up parallel computations
#
# Despite being essentially the same as beast and beast.irreg, beast123 is provided mainly
# to support concurrent handling of multiple time series (e.g., stacked satellite images)
# via parallel computing: When processing stacked raster layers, DO NOT iterate pixel by pixel
# using beast() or beast.irreg() via an external parallel caller (e.g., doParallel or foreach).

#------------------------------Example 1: one time series with seasonalty-------------#
# Yellowstone is a half-monthly time series of  774 NDVI measurmments at a Yellowstone
# site starting from July 1-15,1981(i.e., start=c(1981,7,7). It has 24 data points per
# year (freq=24).

library(Rbeast)
data(Yellowstone)
plot(Yellowstone)

# Below, the four option args are missing, so defalut values will be used, with some
# warning messages given to altert this. By default, the input Y is assumed to be regular
# with a seasonal component. The default arg values used will be printed out and they can
# serve as a template to customize the parameters.

o = beast123(Yellowstone)
plot(o)

#------------------------------Example 2: a trend-only time series-------------------#
# Nile is an annual river flow time series (i.e., no periodic variation). So, season
# is set to 'none' to indicate trend-only analysis. Default values are used for other
# missing options. Unlike the beast() function, beast123 does NOT use the time attributes
# of a 'ts' object. For example, Nile is treated as a pure data number; its (start=1871,
# end=1970, freq=1) attributes are ignored. The default times 1:length(Nile) are used
# instead. The true time info need to be specified by the 'metadata' parameter, as shown
# in the next example.

o = beast123(Nile,season='none')
plot(o)

#------------------------------Example 3: call via the full API interface-----------#
# Specify metadata, prior, mcmc, and extra explicitly. Only 'prior' is the true statistical
# model parameters of BEAST; the other three are just options to configure the input/ouput
# or the computation process.

## Not run:

# metadata is NOT part of BEAST itself, but some extra info to describe the input
# time series Y. Below, the input Y is the 'Yellowstone' ts.

#metadata\$isRegularOrdered = TRUE        # This arg not used any longer in this version
metadata\$whichDimIsTime   = 1            # Which dim of the input refer to time for
# 2D/3D inputs? Ignored for a single time
# series input.
metadata\$startTime        = c(1981,7,7)  # Or startTime=1981.5137
#    startTime=as.Date('1981-7-7')
metadata\$deltaTime        = 1/24         # Half-monthly regular ts: 0.5/12=1/24
metadata\$period           = 1.0          # The period is 1 year:
# freq x deltaTime = period
# 24    x  1/24    = 1.0
metadata\$omissionValue    = NaN          # By default, NaNs are ignored
metadata\$maxMissingRateAllowed = 0.7500  # If missingness is higher than .75, the ts
#  is  skipped and not fitted
metadata\$deseasonalize    = FALSE        # Do not remove the global seasonal pattern
#  before fitting the beast model
metadata\$detrend          = FALSE        # Do not remove the global trend before
# the fitting

# prior is the ONLY true parameters of the beast model,used to specify the priors
# in the Bayesian formulation
prior = list()
prior\$seasonMinOrder   = 1         #min harmonic order allowed to fit seasonal cmpnt
prior\$seasonMaxOrder   = 5         #max harmonic order allowed to fit seasonal cmpnt
prior\$seasonMinKnotNum = 0         #min number of changepnts in seasonal cmpnt
prior\$seasonMaxKnotNum = 3         #max number of changepnts in seasonal cmpnt
prior\$seasonMinSepDist = 10        #min inter-chngpts separation for seasonal cmpnt
prior\$trendMinOrder	= 0         #min polynomial order allowed to fit trend cmpnt
prior\$trendMaxOrder	= 1         #max polynomial order allowed to fit trend cmpnt
prior\$trendMinKnotNum  = 0         #min number of changepnts in trend cmpnt
prior\$trendMaxKnotNum  = 15        #max number of changepnts in trend cmpnt
prior\$trendMinSepDist  = 5         #min inter-chngpts separation for trend cmpnt
prior\$precValue        = 10.0      #Initial value of the precision parameter (no
# need to change it unless for precPrioType='const')
prior\$precPriorType    = 'uniform' # Possible values: const, uniform, and componentwise

# mcmc is NOT part of the beast model itself, but some parameters to configure the
# MCMC inference.
mcmc = list()
mcmc\$seed                      = 9543434# an arbitray seed for random number generator
mcmc\$samples                   = 3000   # samples collected per chain
mcmc\$thinningFactor            = 3      # take every 3rd sample and discard others
mcmc\$burnin                    = 150    # discard the initial 150 samples per chain
mcmc\$chainNumber               = 3      # number of chains
mcmc\$maxMoveStepSize           = 4      # max random jump step when proposing new chngpts
mcmc\$trendResamplingOrderProb  = 0.100  # prob of choosing to resample polynomial order
mcmc\$seasonResamplingOrderProb = 0.100  # prob of choosing to resample harmonic order
mcmc\$credIntervalAlphaLevel    = 0.950  # the significance level for credible interval

# extra is NOT part of the beast model itself, but some parameters to configure the
# output and computation process
extra = list()
extra\$dumpInputData        = FALSE #If true, a copy of input time series is outputted
extra\$whichOutputDimIsTime = 1     #For 2D or 3D inputs, which dim of the output refers to
# time? Ignored if the input is a single time series
extra\$computeCredible      = FALSE #If true, compute CI: computing CI is time-intensive.
extra\$fastCIComputation    = TRUE  #If true, a faster way is used to get CI, but it is
# still time-intensive. That is why the function beast()
# is slow because it always compute CI.
extra\$computeSeasonOrder   = FALSE #If true, dump the estimated harmonic order over time
extra\$computeTrendOrder    = FALSE #If true, dump the estimated polynomial order over time
extra\$computeSeasonChngpt  = TRUE  #If true, get the most likely locations of s chgnpts
extra\$computeTrendChngpt   = TRUE  #If true, get the most likely locations of t chgnpts
extra\$computeSeasonAmp     = FALSE #If true, get time-varying amplitude of seasonality
extra\$computeTrendSlope    = FALSE #If true, get time-varying slope of trend
extra\$tallyPosNegSeasonJump= FALSE #If true, get those changpts with +/- jumps in season
extra\$tallyPosNegTrendJump = FALSE #If true, get those changpts with +/- jumps in trend
extra\$tallyIncDecTrendJump = FALSE #If true, get those changpts with increasing/
# decreasing trend slopes
extra\$printProgressBar     = TRUE
extra\$printOptions         = TRUE
extra\$quiet                = FALSE # print warning messages, if any
extra\$consoleWidth         = 0     # If 0, the console width is from the current console
# Y has multiple time series (e.g.,stacked images)

plot(o)

## End(Not run)

#------------------------------Example 4: Handle irregular time series-----------------#
#  Handle irregular time series: ohio is a data frame of a Landsat NDVI series observed
#  at unevely-spaced times

## Not run:

data(ohio)
str(ohio)

metadata\$time             = ohio\$time # Must supply individual times for irregular inputs
metadata\$deltaTime        = 1/12      # Must supply the desired time interval for aggregation

o=beast123(ohio\$ndvi, metadata)       # Default values used for those missing parameters

#####################################################################################
class(ohio\$rdate)                      # Another accepted time format for beast123

metadata\$deltaTime        = 1/12       # Must supply the desired time interval for aggregation
metadata\$time             = ohio\$rdate # Must supply individual times for irregular inputs

o=beast123(ohio\$ndvi, metadata)        # Default values used for those missing parameters

#####################################################################################
ohio\$Y                              # Another accepted time format for beast123
ohio\$M
ohio\$M

metadata\$deltaTime        = 1/12    # Must supply the desired time interval for aggregation
o=beast123(ohio\$ndvi, metadata)     # Default values used for those missing parameters

#####################################################################################
ohio\$Y                              # Another accepted time format for beast123
ohio\$doy

metadata\$deltaTime        = 1/12    # Must supply the desired time interval for aggregation
o=beast123(ohio\$ndvi, metadata)     # Default values used for those missing parameters

#####################################################################################
ohio\$time                           # Another accepted time format for beast123

metadata\$deltaTime        = 1/12      # Must supply the desired time interval for aggregation
metadata\$time             = ohio\$time # Fractional year

o=beast123(ohio\$ndvi, metadata)       # Default values used for those missing parameters

#####################################################################################
ohio\$datestr1                       # Another accepted time format for beast123

metadata\$deltaTime        = 1/12         # Must supply the time interval for aggregation

o=beast123(ohio\$ndvi, metadata)          # Default values used for those missing parameters

#####################################################################################
ohio\$datestr2                            # Another accepted time format for beast123
metadata\$deltaTime        = 1/12         # Must supply a desired time interval for aggregation

o=beast123(ohio\$ndvi, metadata)          # Default values used for those missing parameters

#####################################################################################
ohio\$datestr3                            # Another accepted time format for beast123
metadata\$deltaTime        = 1/12         # Must supply the desired time interval for aggregation

o=beast123(ohio\$ndvi, metadata)          # Default values used for those missing parameters

## End(Not run)

#------------------Example 4: Handle multiple time series (i.e., matrix input)-----------#
# Handle multiple time series: 'simdata' is a 2D matrix of dim 300x3; it consits of 3
# time series of length 300 each. For this toy example, I decide to be lazy and use the same
# time series for the three columns.
## Not run:
data(simdata)                         # dim of simdata: 300 x 3 (time x num_of_time_series)
dim(simdata)                          # the first dimenion refer to time (i.e, 300)

metadata\$whichDimIsTime   = 1         # Which dim of the input refer to time for 2D inputs?
# 300 is the ts length, so dim is set to '1' here.
metadata\$period           = 24        # By default, we assume startTime=1 and deltaTime=1

extra=list()
extra\$whichOutputDimIsTime = 2            # Which dim of the output arrays refers to  time?
o=beast123(simdata, metadata,extra=extra) # Default values used for those missing parameters

# The lists of arg parameters can also be directly provided inline within the command

# The field names of the lists can be shortened as long as no ambiguitity is caused.

#------------------Example 4: Another run by transposing simdata--------------------------#

simdata1=t(simdata)                   # dim of simdata1: 3 x 300 (num of ts  x time )

metadata\$whichDimIsTime   = 2         # Which dim of the input refer to time for 2D inputs?
# 300 is the ts length, so dim is set to '2' here.
metadata\$period           = 24        # By default, we assume startTime=1 and deltaTime=1
o=beast123(simdata1, metadata)        # Default values used for those missing parameters

## End(Not run)

#------------------Example 5: Handle stacked time series images (e.g., 3d input)--------#
# Handle 3D stacked images of irregular and unordered time-series: imagestack is a 3D
# array of size 12x9x1066, each pixel being a time series of length 1066
## Not run:
data(imagestack)
dim(imagestack\$ndvi)               # Dim: 12 x 9 X 1066 (row x col x time)
imagestack\$datestr                 # A character vector of 1066 date strings

metadata\$whichDimIsTime   = 3      # Which dim of the input refer to time for 3D inputs?
# 1066 is the ts length, so dim is set to '3' here.
# In this example, this arg is not needed because
# the time\$datestr can also help to match and pick up
# the right time dimesion of imagestack\$ndvi.
metadata\$deltaTime        =  1/12  # Aggregate the irregular ts at a monthly interval:1/12 Yr
metadata\$period           =  1.0   # The period is 1 year: deltaTime*freq=1/12*12=1.0

extra = list()
extra\$dumpInputData        = TRUE  # Get a copy of aggregated input ts
extra\$numThreadsPerCPU     = 2     # Each cpu core will be assigned 2 threads
# if >0, used to specify the total number of threads

# Default values for missing parameters

print(o,c(5,3))    # print the result for the pixel at Row 5 and Col 3
plot(o,c(5,3))     # plot the result for the pixel at Row 5 and Col 3
image(o\$trend\$ncp) # number of trend changepoints over space

## End(Not run)

#---------Example 6: Handle stacked GeoTiff image files imported with the raster package------#
# Handle 3D stacked images of irregular time-series : 'ndvi.zip' is  a zip file of
# 437 NDIV tiff image files, each having a dim of 12 x 9.
# Code availlable at https://github.com/zhaokg/Rbeast/blob/master/R/beast123_raster_example.txt

``````

Rbeast documentation built on May 31, 2023, 9:23 p.m.