beast.irreg: Bayesian time series decomposition for changepoint, trend,...

View source: R/beast.irreg.R

beast.irregR Documentation

Bayesian time series decomposition for changepoint, trend, and periodicity or seasonality

Description

A Bayesian model averaging algorithm called BEAST to decompose time series or 1D sequential data into individual components, such as abrupt changes, trends, and periodic/seasonal variations. BEAST is useful for changepoint detection (e.g., breakpoints or structural breaks), nonlinear trend analysis, time series decomposition, and time series segmentation.

Usage


  beast.irreg(
       y, 
       time,        
       deltat         = NULL,
       season         = c("harmonic", "svd", "dummy", "none"),  
       period         = NULL,  		   
       scp.minmax     = c(0,10),   sorder.minmax   = c(0,5),  	  
       tcp.minmax     = c(0,10),   torder.minmax   = c(0,1), 	   
       sseg.min       = NULL,      sseg.leftmargin = NULL,  sseg.rightmargin = NULL,
       tseg.min       = NULL,      tseg.leftmargin = NULL,  tseg.rightmargin = NULL, 
       method         = c('bayes', 'bic', 'aic', 'aicc', 'hic',
	                      'bic0.25', 'bic0.5', 'bic1.5', 'bic2' ),
       detrend        = FALSE, 
       deseasonalize  = FALSE,
       mcmc.seed      = 0,      
       mcmc.burnin    = 200, 
       mcmc.chains    = 3,
       mcmc.thin      = 5,
       mcmc.samples   = 8000,
       precValue      = 1.5,
       precPriorType  = c('componentwise', 'uniform', 'constant', 'orderwise'),
       hasOutlier	  = FALSE,
       ocp.minmax     = c(0,10),			 	   
       print.param    = TRUE,
       print.progress = TRUE,
       print.warning  = TRUE,
       quiet          = FALSE,
       dump.ci        = FALSE,
       dump.mcmc      = FALSE,
       gui            = FALSE,
       ... )

Arguments

y

a vector for an irregular or unordered time series. Missing values such as NA and NaN are allowed.

  • If y is regular and evenly-spaced in time, use the beastfunction instead.

  • If y is a matrix or 3D array (e.g., stacked images) consisting of multiple regular or irregular time series, use beast123 instead.

If y is a list of multiple time series, the multivariate version of the BEAST algorithm is invoked to decompose the multiple time series and detect common changepoints altogether. This feature is experimental and under further development. Check ohio for a working example.

time

a vector of the same length as y's time dimension to provide the times for datapoints. It can be a vector of numbers, Dates, or date strings; it can also be a list of vectors of year, months, and days. Possible formats include:

  1. a vector of numerical values [e.g., c(1984.23, 1984.27, 1984.36, ...)]. The unit of the times is irrelevant to BEAST as long as it is consistent with the unit used for specifying startTime, deltaTime, and period.

  2. a vector of R Dates [e.g., as.Date( c("1984-03-27", "1984-04-10", "1984-05-12",... )].

  3. a vector of char strings. Examples are:

    • c("1984-03-27", "1984-04-10", "1984-05-12")

    • c("1984/03/27", "1984,04,10", "1984 05 12") (i.e., the delimiters differ as long as the YMD order is consistent)

    • c("LT4-1984-03-27", "LT4-1984-04-10", "LT4-1984+05,12")

    • c("LT4-1984087ndvi", "LT4-1984101ndvi", "LT4-1984133ndvi")

    • c("1984,,abc 3/ 27", "1984,,ddxfdd 4/ 10" "ggd1984,, 5/ ttt 12")

    BEAST uses several heuristics to automatically parse the date strings without a format specifier but may fail due to ambiguity (e.g., in "LC8-2020-09-20-1984", no way to tell if 2020 or 1984 is the year). To ensure correctness, use a list object as explained below to provide a date format specifier.

  4. a list object time=list(datestr=..., strfmat='...') consisting of a vector of date strings (time$datestr) and a format specifier (time$strFmt). The string time$strFmt specifies how to parse dateStr. Three formats are currently supported:

    • (a). All the date strings have a fixed pattern in terms of the relative positions of Year, Month, and Day. For example, to extract 2001/12/02 etc from time$dateStr = c('P23R34-2001.1202333xd', 'O93X94-2002.1108133fd', 'TP3R34-2009.0122333td') use time$strFmt='P23R34-yyyy.mmdd333xd' where yyyy, mm, and dd are the specifiers and other positions are wildcards and can be filled with any other letters different from yyyy, mm and dd.

    • (b). All the date strings have a fixed pattern in terms of the relative positions of year and doy. For example, to extract 2001/045(day of year) from 'P23R342001888045', use strFmt='123123yyyy888doy' where yyyy and doy are the specifiers and other positions are wildcards and can be filled with any other letters different from yyyy, and doy. 'doy' must be three digit in length.

    • (c). All the date strings have a fixed pattern in terms of the separation characters between year, month, and day. For example, to extract 2002/12/02 from '2002,12/02', ' 2002 , 12/2', '2002,12 /02 ', use strFmt='Y,M/D' where the whitespaces are ignored. To get 2002/12/02 from '2–12, 2012 ', use strmFmt='D–M,Y'.

  5. a list object of vectors to specify individual dates of the time series. Use time$year,time$month,and time$day to give the dates; or alternatively use time$year and time$doy where each value of the doy vector is a number within 1 and 365/366. Each vector must have the same length as the time dimension of Y.

deltat

a number or a string to specify a time interval for aggregating the irregular y into a regular time series. The BEAST model is currently formulated for regular data only for fast computation, so internally, the beast.irreg function will aggregate/re-bin irregular data into regular ones. For the aggregation, deltat is needed to specify the desired bin size or time interval; if missing, a best guess will be used. The unit of deltat needs to be consistent with time. If time takes a numeric vector, the unit of deltat is arbitrary and irrelevant to beast. If time takes a vector of Dates or date strings, the unit for deltat is assumed to Fractional YEAR. If needed, use a string instead of a number to specify whether the unit of deltat is day, month, or year. Examples include '7 days', '7d', '1/2 months', '1mn', '1.0 year', and '1y'.

season

characters (default to 'harmonic'); specify if y has a periodic component or not. Four strings are possible.

  • 'none': y is trend-only; no periodic components are present in the time series. The args for the seasonal component (i.e.,sorder.minmax, scp.minmax and sseg.max) will be irrelevant and ignored.

  • 'harmonic': y has a periodic/seasonal component. The term season is a misnomer, being used here to broadly refer to any periodic variations present in y. The periodicity is NOT a model parameter estimated by BEAST but a known constant given by the user through freq. By default, the periodic component is modeled as a harmonic curve–a combination of sins and cosines.

  • 'dummy': the same as 'harmonic' except that the periodic/seasonal component is modeled as a non-parametric curve. The harmonic order arg sorder.minmax is irrelevant and is ignored.

  • 'svd': (experimental feature) the same as 'harmonic' except that the periodic/seasonal component is modeled as a linear combination of function bases derived from a Single-value decomposition. The SVD-based basis functions are more parsimonious than the harmonic sin/cos bases in parameterizing the seasonal variations; therefore, more subtle changepoints are likely to be detected.

period

numeric or string. Specify the period for the periodic/seasonal component in y. Needed only for data with a periodic/cyclic component (i.e., season='harmonic' or 'dummy') and not used for trend-only data (i.e., season='none'). The period of the cyclic component should have a unit consisent with the unit of deltat. It holds that period=deltat*freq where freq is the number of data samples per period. (Note that the freq argument in earlier versions becomes obsolete and now is replaced by period. freq is still supported butperiod takes precedence if both are provided.) period or the number of data points per period is not a BEAST model parameter and it has to be specified by the user. But if period is missing, BEAST first attempts to guess its value via auto-correlation before fitting the model. If period <= 0, season='none' is assumed, and the trend-only model is fitted without a seasonal/cyclic component. If needed, use a string to specify whether the unit of period is day, month, or year. Examples are '1.0 year', '12 months', '365d', '366 days'.

scp.minmax

a vector of 2 integers (>=0); the min and max number of seasonal changepoints (scp) allowed in segmenting the seasonal component. scp.minmax is used only if y has a seasonal component (i.e., season='harmonic' or 'dummy' ) and ignored for trend-only data. If the min and max changepoint numbers are equal, BEAST assumes a constant number of scp and won't infer the posterior probability of the number of changepoints, but it still estimates the occurrence probability of the changepoints over time (i.e., the most likely times at which these changepoints occur). If both the min and max numbers are set to 0, no changepoints are allowed; then a global harmonic model is used to fit the seasonal component, but still, the most likely harmonic order will be inferred if sorder.minmax[1] is not equal to sorder.minmax[2].

sorder.minmax

a vector of 2 integers (>=1); the min and max harmonic orders considered to fit the seasonal component. sorder.minmax is used only used if the time series has a seasonal component (i.e., season='harmonic') and ignored for trend-only data or when season='dummy'. If the min and max orders are equal (sorder.minmax[1]=sorder.minmax[2]), BEAST assumes a constant harmonic order used and won't infer the posterior probability of harmonic orders.

tcp.minmax

a vector of 2 integers (>=0); the min and max number of trend changepoints (tcp) allowed in segmenting the trend component. If the min and max changepoint numbers are equal, BEAST assumes a constant number of changepoints and won't infer the posterior probability of the number of changepoints for the trend, but it still estimates the occurrence probability of the changepoints over time (i.e., the most likely times at which these changepoints occur in the trend). If both the min and max numbers are set to 0, no changepoints are allowed; then a global polynomial trend is used to fit the trend component, but still, the most likely polynomial order will be inferred if torder.minmax[1] is not equal to torder.minmax[2].

torder.minmax

a vector of 2 integers (>=0); the min and max orders of the polynomials considered to fit the trend component. The 0-th order corresponds to a constant term/a flat line and the 1st order is a line. If torder.minmax[1]=torder.minmax[2], BEAST assumes a constant polynomial order used and won't infer the posterior probability of polynomial orders.

sseg.min

an integer (>0); the min segment length allowed between two neighboring season changepoints. That is, when fitting a piecewise harmonic seasonal model, two changepoints are not allowed to occur within a time window of length sseg.min. sseg.min must be an unitless integer–the number of time intervals/data points so that the time window in the original unit is sseg.min*deltat. sseg.min defaults to NULL and its value will be given a default value in reference to freq.

sseg.leftmargin

an integer (>=0); the number of leftmost data points excluded for seasonal changepoint detection. That is, when fitting a piecewise harmonic seasonal model, no changepoints are allowed in the starting window/segment of length sseg.leftmargin. sseg.leftmargin must be an unitless integer–the number of time intervals/data points so that the time window in the original unit is sseg.leftmargin*deltat. If missing, sseg.leftmargin defaults to sseg.min.

sseg.rightmargin

an integer (>=0); the number of rightmost data points excluded for seasonal changepoint detection. That is, when fitting a piecewise harmonic seasonal model, no changepoints are allowed in the ending window/segment of length sseg.rightmargin. sseg.rightmargin must be an unitless integer–the number of time intervals/data points so that the time window in the original unit is sseg.rightmargin*deltat. If missing, sseg.rightmargin defaults to sseg.min.

tseg.min

an integer (>0); the min segment length allowed between two neighboring trend changepoints. That is, when fitting a piecewise polynomial trend model, two changepoints are not allowed to occur within a time window of length tseg.min. tseg.min must be an unitless integer–the number of time intervals/data points so that the time window in the original unit is tseg.min*deltat. tseg.min defaults to NULL and its value will be given a default value in reference to freq if the time series has a cyclic component.

tseg.leftmargin

an integer (>=0); the number of leftmost data points excluded for trend changepoint detection. That is, when fitting a piecewise polynomial trend model, no changepoints are allowed in the starting window/segment of length tseg.leftmargin. tseg.leftmargin must be an unitless integer–the number of time intervals/data points so that the time window in the original unit is tseg.leftmargin*deltat. If missing, tseg.leftmargin defaults to tseg.min.

tseg.rightmargin

an integer (>=0); the number of rightmost data points excluded for trend changepoint detection. That is, when fitting a piecewise polynomial trend model, no changepoints are allowed in the ending window/segment of length tseg.rightmargin. tseg.rightmargin must be an unitless integer–the number of time intervals/data points so that the time window in the original unit is tseg.rightmargin*deltat. If missing, tseg.rightmargin defaults to tseg.min.

method

a string (default to 'bayes'); specify the method for formulating model posterior probability.

  • 'bayes': the full Bayesian formulation as described in Zhao et al. (2019).

  • 'bic': approximation of posterior probability using the Bayesian information criterion bic=n*ln(SSE)+ k*ln(n) where k and n are the numbers of parameters and datapoints.

  • 'aic': approximation of posterior probability using the Akaike information criterion aic=n*ln(SSE)+ 2k.

  • 'aicc': approximation of posterior probability using the corrected Akaike information criterion aicc=aic+ (2k^2+k*2)/(n-k-1).

  • 'hic': approximation of posterior probability using the Hannan-Quinn information criterion hic = n*ln(SSE) + 2k*ln(ln(n).

  • 'bic0.25': approximation using the Bayesian information criterion adopted from Kim et al. (2016) <doi:10.1016/j.jspi.2015.09.008>; bic0.25 = n*ln(SSE) + 0.25k*ln(n) with less complexity penelaty than the standard BIC.

  • 'bic0.50': the same as above except that the penalty factor is 0.50.

  • 'bic1.5': the same as above except that the penalty factor is 1.5.

  • 'bic2': the same as above except that the penalty factor is 2.0.

detrend

logical; If TRUE, a global trend is first fitted and removed from the time series before running BEAST; after BEAST finishes, the global trend is added back to the BEAST result.

deseasonalize

logical; If TRUE, a global seasonal model is first fitted and removed from the time series before running BEAST; after BEAST finishes, the global seasonal curve is added back to the BEAST result. deseasonalize is ignored if season='none' (i.e., trend-only data).

mcmc.seed

integer (>=0); the seed for the random number generator used for Monte Carlo Markov Chain (mcmc). If mcmc.seed=0, an arbitrary seed is picked and the fitting results vary across runs. If fixed to the same non-zero integer, the result can be re-produced for different runs. But the results from the same seed may still vary if run on different computers because the random generator library depends on CPU's instruction sets.

mcmc.chains

integer (>0); the number of MCMC chains.

mcmc.thin

integer (>0); a factor to thin chains (e.g., if thinningFactor=5, samples will be taken every 3 iterations)

mcmc.burnin

integer (>0); the number of burn-in samples discarded at the start of each chain

mcmc.samples

integer (>=0); the number of samples collected per MCMC chain. The total number of iterations is (burnin+samples*thin)*chains.

precValue

numeric (>0); the hyperparameter of the precision prior; the default value is 1.5. precValue is useful only when precPriorType='constant', as further explained below

precPriorType

characters. It takes one of 'constant', 'uniform', 'componentwise' (the default), and 'orderwise'. Below are the differences between them.

  1. 'constant': the precision parameter used to parameterize the model coefficients is fixed to a constant specified by precValue. In other words, precValue is a user-defined hyperparameter and the fitting result may be sensitive to the chosen values of precValue.

  2. 'uniform': the precision parameter used to parameterize the model coefficients is a random variable; its initial value is specified by precValue. In other words, precValue will be inferred by the MCMC, so the fitting result will be insensitive to the choice in precValue.

  3. 'componentwise': multiple precision parameters are used to parameterize the model coefficients for individual components (e.g., one for season and another for trend); their initial values is specified by precValue. In other words, precValue will be inferred by the MCMC, so the fitting result will be insensitive to the choice in precValue.

  4. 'orderwise': multiple precision parameters are used to parameterize the model coefficients not just for individual components but also for individual orders of each component; their initial values is specified by precValue. In other words, precValue will be inferred by the MCMC, so the fitting result will be insensitive to the choice in precValue.

hasOutlier

boolean; if true, fit a model with an outlier component that refers to potential spikes or dips at isolated data points: Y = trend + outlier + error if season='none',and Y = trend + season + outlier + error if season ~= 'none'.

ocp.minmax

a vector of 2 integers (>=0); the min and max numbers of outlier-type changepoints (ocp) allowed in the time seriestrend component. Ocp refers to spikes or dips at isolated times that can't be modeled as trends or seasonal terms.

print.param

boolean. If TRUE,the full list of input parameters to BEAST will be printed out prior to the MCMC inference; the naming for this list (e.g., metadata, prior, and mcmc) differs slightly from the input to beast, but there is a one-to-one correspondence (e.g., prior$trendMinSepDist=tseg.min). Internally, beast converts the input parameters to the forms of metadata, prior,and mcmc. Type 'View(beast)' to see the details or check the beast123 function.

print.progress

boolean;If TRUE, print a progressbar.

print.warning

boolean;If TRUE, print warning messages

quiet

boolean. If TRUE, print nothing.

dump.ci

boolean; If TRUE, credible intervals (i.e., out$season$CI or out$trend$CI) will be computed for the estimated seasonal and trend components. Computing CI is time-consuming, due to sorting, so set ci to FALSE if a symmetric credible interval (i.e., out$trend$SD and out$season$SD) suffices.

dump.mcmc

boolean; If TRUE, dump individual samples of the MCMC chains.

gui

boolean. If TRUE, BEAST will be run with a GUI window to show an animation of the MCMC sampling in the model space step by step; as an experimental feature, "gui=TRUE" works only for Windows x64 systems not Windows 32 or Linux/Mac.

...

additional parameters. There are many more settings for the implementation but not made available in the beast() interface; please use the function beast123() instead

Value

The output is an object of class "beast". It is a list, consisting of the following variables. In the explanations below, we assume the input y is a single time series of length N:

time

a vector of size 1xN: the times at the N sampled locations. By default, it is simply set to 1:N if the input arguments delta, start, or time are missing.

data

a vector, matrix, or 3D array; this is a copy of the input data if extra$dumpInputData = TRUE. If extra$dumpInputData=FALSE, it is set to NULL. If the original input data is irregular, the copy here is the regular version aggregated from the original at the time interval specified by metadata$deltaTime.

marg_lik

numeric; the average of the model marginal likelihood; the larger marg_lik, the better the fitting for a given time series.

R2

numeric; the R-square of the model fitting.

RMSE

numeric; the RMSE of the model fitting.

sig2

numeric; the estimated variance of the model error.

trend

a list object consisting of various outputs related to the estimated trend component:

  • ncp: [Number of ChangePoints]. a numeric scalar; the mean number of trend changepoints. Individual models sampled by BEAST has a varying dimension (e.g., number of changepoints or knots), so several alternative statistics (e.g., ncp_mode, ncp_median, and ncp_pct90) are also given to summarize the number of changepoints. For example, if mcmc$samples=10, the numbers of changepoints for the 10 sampled models are assumed to be c(0, 2, 4, 1, 1, 2, 7, 6, 6, 1). The mean ncp is 3.1 (rounded to 3), the median is 2.5 (2), the mode is 1, and the 90th percentile (ncp_pct90) is 6.5.

  • ncp_mode: [Number of ChangePoints]. a numeric scalar; the mode for number of changepoints. See the above for explanations.

  • ncp_median: [Number of ChangePoints]. a numeric scalar; the median for number of changepoints. See the above for explanations.

  • ncp_pct90: [Number of ChangePoints]. a numeric scalar; the 90th percentile for number of changepoints. See the above for explanations.

  • ncpPr: [Probability of the Number of ChangePoints]. A vector of length (tcp.minmax[2]+1)=tcp.max+1. It gives a probability distribution of having a certain number of trend changepoints over the range of [0,tcp.max]; for example, ncpPr[1] is the probability of having no trend changepoint; ncpPr[i] is the probability of having (i-1) changepoints: Note that it is ncpPr[i] not ncpPr[i-1] because ncpPr[1] is used for having zero changepoint.

  • cpOccPr: [ChangePoint OCCurence PRobability]. a vector of length N; it gives a probability distribution of having a changepoint in the trend at each point of time. Plotting cpOccPr will depict a continious curve of probability-of-being-changepoint. Of particular note, in the curve, a higher peak indicates a higher chance of being a changepoint only at that particular SINGLE point in time and does not necessarily mean a higher chance of observing a changepoint AROUND that time. For example, a window of cpOccPr values c(0,0,0.5,0,0) (i.e., the peak prob is 0.5 and the summed prob is 0.5) is less likely to be a changepoint compared to another window c(0.1,0.2,0.21,0.2,0.1) (i.e., the peak prob is 0.21 but the summed prob is 0.71).

  • order: a vector of length N; the average polynomial order needed to approximate the fitted trend. As an average over many sampled individual piece-wise polynomial trends, order is not necessarily an integer.

  • cp: [Changepoints] a vector of length tcp.max=tcp.minmax[2]; the most possible changepoint locations in the trend component. The locations are obtained by first applying a sum-filtering to the cpOccPr curve with a filter window size of tseg.min and then picking up to a total prior$MaxKnotNum/tcp.max of the highest peaks in the filtered curve. NaNs are possible if no enough changepoints are identified. cp records all the possible changepoints identified and many of them are bound to be false positives. Do not blindly treat all of them as actual changepoints.

  • cpPr: [Changepoints PRobability] a vector of length tcp.max=tcp.minmax[2]; the probabilities associated with the changepoints cp. Filled with NaNs for the remaining elements if ncp<tcp.max.

  • cpCI: [Changepoints Credible Interval] a matrix of dimension tcp.max x 2; the credible intervals for the detected changepoints cp.

  • cpAbruptChange: [Abrupt change at Changepoints] a vector of length tcp.max; the jumps in the fitted trend curves at the detected changepoints cp.

  • Y: a vector of length N; the estimated trend component. It is the Bayesian model averaging of all the individual sampled trend.

  • SD: [Standard Deviation] a vector of length N; the estimated standard deviation of the estimated trend component.

  • CI: [Standard Deviation] a matrix of dimension N x 2; the estimated credible interval of the estimated trend. One vector of the matrix is for the upper envelope and another for the lower envelope.

  • slp: [Slope] a vector of length N; the time-varying slope of the fitted trend component .

  • slpSD: [Standar Deviation of Slope] a vector of length N; the SD of the slope for the trend component.

  • slpSgnPosPr: [PRobability of slope having a positive sign] a vector of length N; the probability of the slope being positive (i.e., increasing trend) for the trend component. For example, if slpSgnPosPr=0.80 at a given point in time, it means that 80% of the individual trend models sampled in the MCMC chain has a positive slope at that point.

  • slpSgnZeroPr: [PRobability of slope being zero] a vector of length N; the probability of the slope being zero (i.e., a flat constant line) for the trend component. For example, if slpSgnZeroPr=0.10 at a given point in time, it means that 10% of the individual trend models sampled in the MCMC chain has a zero slope at that point. The probability of slope being negative can be obtained from 1-slpSgnZeroPr-slpSgnPosPr.

  • pos_ncp:

  • neg_ncp:

  • pos_ncpPr:

  • neg_ncpPr:

  • pos_cpOccPr:

  • neg_cpOccPr:

  • pos_cp:

  • neg_cp:

  • pos_cpPr:

  • neg_cpPr:

  • pos_cpAbruptChange:

  • neg_cpAbruptChange:

  • pos_cpCI:

  • neg_cpCI: The above variables have the same outputs as those variables without the prefix 'pos' and 'neg', except that we differentiate the changepoints with a POStive jump in the trend from those changepoints with a NEGative jump. For example, pos_ncp refers to the average number of trend changepoints that jump up (i.e., positively) in the trend.

  • inc_ncp:

  • dec_ncp:

  • inc_ncpPr:

  • dec_ncpPr:

  • inc_cpOccPr:

  • dec_cpOccPr:

  • inc_cp:

  • dec_cp:

  • inc_cpPr:

  • dec_cpPr:

  • inc_cpAbruptChange:

  • dec_cpAbruptChange:

  • inc_cpCI:

  • dec_cpCI: The above variables have the same outputs as those variables without the prefix 'inc' and 'dec', except that we differentiate the changepoints at which the trend slope increases from those changepoints at which the trend slope decreases. For example, if the trend slopes before and after a chngpt is 0.4 and 2.5, then the changepoint is counted toward inc_ncp.

season

a list object consisting of various outputs related to the estimated seasonal/periodic component:

  • ncp: [Number of ChangePoints]. a numeric scalar; the mean number of seasonal changepoints.

  • ncpPr: [Probability of the Number of ChangePoints]. A vector of length (scp.minmax[2]+1)=scp.max+1. It gives a probability distribution of having a certain number of seasonal changepoints over the range of [0,scp.max]; for example, ncpPr[1] is the probability of having no seasonal changepoint; ncpPr[i] is the probability of having (i-1) changepoints: Note that the index is i rather than (i-1) because ncpPr[1] is used for having zero changepoint.

  • cpOccPr: [ChangePoint OCCurence PRobability]. a vector of length N; it gives a probability distribution of having a changepoint in the seasonal component at each point of time. Plotting cpOccPr will depict a continious curve of probability-of-being-changepoint over the time. Of particular note, in the curve, a higher value at a peak indicates a higher chance of being a changepoint only at that particular SINGLE point in time, and does not necessarily mean a higher chance of observing a changepoint AROUND that time. For example, a window of cpOccPr values c(0,0,0.5,0,0) (i.e., the peak prob is 0.5 and the summed prob is 0.5) is less likely to be a changepoint compared to another window values c(0.1,0.2,0.3,0.2,0.1) (i.e., the peak prob is 0.3 but the summed prob is 0.8).

  • order: a vector of length N; the average harmonic order needed to approximate the seasonal component. As an average over many sampled individual piece-wise harmonic curves, order is not necessarily an integer.

  • cp: [Changepoints] a vector of length scp.max=scp.minmax[2]; the most possible changepoint locations in the seasonal component. The locations are obtained by first applying a sum-filtering to the cpOccPr curve with a filter window size of sseg.min and then picking up to a total ncp of the highest peaks in the filtered curve. If ncp<scp.max, the remaining of the vector is filled with NaNs.

  • cpPr: [Changepoints PRobability] a vector of length scp.max; the probabilities associated with the changepoints cp. Filled with NaNs for the remaining elements if ncp<scp.max.

  • cpCI: [Changepoints Credible Interval] a matrix of dimension scp.max x 2; the credible intervals for the detected changepoints cp.

  • cpAbruptChange: [Abrupt change at Changepoints] a vector of length scp.max; the jumps in the fitted seasonal curves at the detected changepoints cp.

  • Y: a vector of length N; the estimated seasonal component. It is the Bayesian model averaging of all the individual sampled seasonal curve.

  • SD: [Standard Deviation] a vector of length N; the estimated standard deviation of the estimated seasonal component.

  • CI: [Standard Deviation] a matrix of dimension N x 2; the estimated credible interval of the estimated seasonal curve. One vector of the matrix is for the upper envelope and another for the lower envelope.

  • amp: [AMPlitude] a vector of length N; the time-varying amplitude of the estimated seasonality.

  • ampSD: [Standar Deviation of AMPlitude] a vector of length N; , the SD of the amplitude of the seasonality.

  • pos_ncp:

  • neg_ncp:

  • pos_ncpPr:

  • neg_ncpPr:

  • pos_cpOccPr:

  • neg_cpOccPr:

  • pos_cp:

  • neg_cp:

  • pos_cpPr:

  • neg_cpPr:

  • pos_cpAbruptChange:

  • neg_cpAbruptChange:

  • pos_cpCI:

  • neg_cpCI: The above variables have the same outputs as those variables without the prefix 'pos' and 'neg', except that we differentiate the changepoints with a POStive jump in the trend from those changepoints with a NEGative jump. For example, pos_ncp refers to the average number of trend changepoints that jump up (i.e., positively) in the trend.

Note

The three functions beast(), beast.irreg(), and beast123() are essentially the same BEAST algorithm but with different APIs. There is a one-to-one correspondence between the parameters for beast() and beast.irreg() and the 'metadata', 'prior','mcmc', and 'extra' objects in the beast123() interface. Examples are:

start <-> metadata$startTime
deltat <-> metadata$deltaTime
deseasonalize <-> metadata$deseasonalize
hasOutlier <-> metadata$hasOutlierCmpnt
scp.minmax[1] <-> prior$seasonMinOrder
scp.minmax[2] <-> prior$seasonMaxOrder
sseg.min <-> prior$seasonMinSepDist
tcp.torder[1] <-> prior$trendMinOrder
tseg.leftmargin <-> prior$trendLeftMargin
mcmc.seed <-> mcmc$seed
dump.ci <-> extra$computeCredible

Experts should use the the beast123 function.

References

  1. Zhao, K., Wulder, M.A., Hu, T., Bright, R., Wu, Q., Qin, H., Li, Y., Toman, E., Mallick, B., Zhang, X. and Brown, M., 2019. Detecting change-point, trend, and seasonality in satellite time series data to track abrupt changes and nonlinear dynamics: A Bayesian ensemble algorithm. Remote Sensing of Environment, 232, p.111181 (the beast algorithm paper).

  2. Zhao, K., Valle, D., Popescu, S., Zhang, X. and Mallick, B., 2013. Hyperspectral remote sensing of plant biochemistry using Bayesian model averaging with variable and band selection. Remote Sensing of Environment, 132, pp.102-119 (the Bayesian MCMC scheme used in beast).

  3. Hu, T., Toman, E.M., Chen, G., Shao, G., Zhou, Y., Li, Y., Zhao, K. and Feng, Y., 2021. Mapping fine-scale human disturbances in a working landscape with Landsat time series on Google Earth Engine. ISPRS Journal of Photogrammetry and Remote Sensing, 176, pp.250-261(a beast application paper).

See Also

beast, beast123, minesweeper, tetris, geeLandsat

Examples


library(Rbeast)



######################################################################################
# Note that the BEAST algorithm is currently implemented to handle only regular time 
# series. 'beast.irreg' accepts irregular time series but internally it aggregates them
# into regular ones prior to applying the BEAST model. For the aggregation, both the 
# "time" and "deltat" args are needed to specify individual times of data points and the
# regular time interval desired. If there is  a cyclic componet, 'period' should also be given; 
# if not, a possible value is guessed via auto-correlation


######################################################################################
# 'ohio' is a data.frame on an irregular Landsat time series of reflectances & ndvi 
# (e.g., surface greenness) at an Ohio site. It has multiple columns of alternative date 
# formats, such as year, month, day, doy (date of year), rdate (R's date class), and
# time (fractional year)

 data(ohio)
 str(ohio)
 plot(ohio$rdate, ohio$ndvi,type='o') # ndvi is irregularly spaced and unordered in time
 
######################################################################################
# Below, 'time' is given as numeric values, which can be of any arbitray unit. Although
# here 1/12 can be interepreted as 1/12 year or 1 month, BEAST itself doesn't care about
# the time unit. So, the unit of 1/12 is irrelevant for BEAST. 'freq' or 'period' is missing
# and a guess of it is used.

 o=beast.irreg(ohio$ndvi, time=ohio$time,deltat=1/12) 
 plot(o)
 print(o)

######################################################################################
# Aggregrate the time series at a monthly interval (deltat=1/12) and explictly provide
# the 'freq' or 'period' arg

 o=beast.irreg(ohio$ndvi, time=ohio$time,deltat=1/12, period=1.0) 
#o=beast.irreg(ohio$ndvi, time=ohio$time,deltat=1/12, freq  =12) 



## Not run: 
######################################################################################
# Aggregate the time series at a half-monthly time interval, and the 'freq' becomes 24 
# while the period is still 1. That is, PERIOD (1.0)=deltat(1/24) X  freq (24)

 o=beast.irreg(ohio$ndvi, time=ohio$time,deltat=1/24, freq   = 24) 
#o=beast.irreg(ohio$ndvi, time=ohio$time,deltat=1/24, period = 1) 

######################################################################################
# 'time' is given as R's dates. The unit is YEAR. 1/12 refers to 1/12 year or 1 month

 o=beast.irreg(ohio$ndvi, time=ohio$rdate,deltat=1/12) 

######################################################################################
# 'time' is given as data strings. The unit is YEAR. 1/12 refers to 1/12 year or 1 month


 o=beast.irreg(ohio$ndvi, time=ohio$datestr1,deltat=1/12)  #"LT4-1984-03-27"  (YYYY-MM-DD)
 o=beast.irreg(ohio$ndvi, time=ohio$datestr2,deltat=1/12)  #"LT4-1984087ndvi" (YYYYDOY)
 o=beast.irreg(ohio$ndvi, time=ohio$datestr3,deltat=1/12)  #"1984,, 3/ 27"    (YYYY M D)
 


######################################################################################
# 'time' is given as data strings, with a format specifier

 

 TIME =list()
 TIME$datestr = ohio$datestr1
 TIME$strfmt  = "LT4-YYYY-MM-DD"   # "LT4-1984-03-27"
 o=beast.irreg(ohio$ndvi, time=TIME,deltat=1/12)  
 
 TIME =list()
 TIME$datestr = ohio$datestr2
 TIME$strfmt  = "LT4-YYYYDOYndvi"   # LT4-1984087ndvi
 o=beast.irreg(ohio$ndvi, time=TIME,deltat=1/12)   
 

######################################################################################
# 'time' is given as  a list object


 TIME = list()
 
 TIME$year  = ohio$Y
 TIME$month = ohio$M
 TIME$day   = ohio$D
 o=beast.irreg(ohio$ndvi, time=TIME,deltat=1/12)   
 
 TIME = list() 
 TIME$year  = ohio$Y
 TIME$doy   = ohio$doy 
 o=beast.irreg(ohio$ndvi, time=TIME, deltat=1/12)    
 


## End(Not run)


Rbeast documentation built on Sept. 12, 2024, 7:36 a.m.