# R/mfpa.R In adespatial: Multivariate Multiscale Spatial Analysis

#'Multi-frequential periodogram analysis
#'
#'This function performs multi-frequential periodogram analysis for univariate
#'temporal or spatial data series collected with equal intervals. Compared with
#'the traditional periodogram used in spectral analysis, this method can detect
#'overlapping signals with fractional frequencies. Fitting a joint
#'polynomial-trigonometric model is achieved by Ordinary Least Squares (OLS)
#'regression. The function also performs autocorrelation analysis of OLS
#'residuals up to a number of lags determined by the user.
#'
#'@aliases mfpa plot.mfpa print.mfpa
#'
#'@param y	Vector of \emph{n} observations (vector of integer or real numbers,
#'  or one-column matrix).
#'@param MaxNFreq	Maximum number of frequencies to be estimated in the stepwise
#'  procedure (e.g. 2).
#'@param MinFreq	Minimum value for frequency estimates (e.g. 3.0).
#'@param MaxFreq	Maximum value for frequency estimates (e.g. 10). Must be larger
#'  than MinFreq and smaller than half of the number of observations in the
#'  series. If unspecified by the user \code{(MaxFreq=NA)}, \code{MaxFreq} is
#'  set to \emph{n}/4 by the function.
#'@param ntrend	Number (0 to 3) of orthogonal polynomial components estimating
#'  the broad-scale trend, to be included in the joint polynomial-trigonometric
#'  model. Use 0 to estimate no trend component, only an intercept.
#'@param nlags	Number of lags to be used for autocorrelation analysis of OLS
#'  residuals. Use 0 to bypass this analysis.
#'@param alpha	Significance threshold for including frequencies.
#'@param x An object of class \code{mfpa}
#'@param xlab,ylab Labels for x and y axes
#'@param \dots Further arguments passed to or from other methods
#'
#'@details
#'
#'The fitting of a joint polynomial-trigonometric model is limited to ordinary
#'least squares (OLS), with autocorrelation analysis of OLS residuals up to a
#'certain lag. Orthogonal polynomials are used to model broad-scale trends,
#'whereas cosines and sines model the periodic structures at intermediate
#'scales. See Dutilleul (2011, section 6.5) and Legendre & Legendre (2012,
#'section 12.4.4) for details. OLS regression could be replaced by an
#'\emph{estimated generalized least squares} (EGLS) procedure, as described in
#'Dutilleul (2011).
#'
#'In spectral analysis in general and in mfpa in particular, the cosines and
#'sines are considered jointly in the search for the dominant frequency
#'components since they are both required to fully account for a frequency
#'component in a linear model. So, when either the cosine or the sine is
#'significant, this is sufficient indication that a significant frequency
#'component has been found. But see the first paragraph of the ‘Recommendations
#'to users’ below.
#'
#'The periodic phenomenon corresponding to each identified frequency is modelled
#'by a cosine and a sine. The first pair (‘cos 1’, ‘sin 1’) corresponds to the
#'first frequency, the second pair to the second frequency, and so on. An
#'intercept is also computed, as well as a polynomial broad-scale trend if
#'argument ntrend > 0. The coefficients shown for each periodic component (‘cos’
#'and ‘sin’) are the OLS regression coefficients. The tests of significance
#'producing the p-values (called ‘prob’ in the output file) are 2-tailed
#'parametric t-tests, as in standard OLS regression.
#'
#'A global R-square statistic for the periodogram is computed as the variance of
#'the fitted values divided by the variance of the data series. An R-squared
#'corresponding to each frequency is also returned.
#'
#'In the Dutilleul periodogram, the time unit is the length of the data series
#'(in time units: seconds, hours, days, etc.). Hence, the \emph{frequency}
#'identified by a Dutilleul periodogram is the number of cycles of the periodic
#'signal (how many full or partial cycles) along the time series. That number is
#'an integer when the series contains an integer number of cycles; it may also
#'be a real number when the number of cycles is fractional. The periodogram can
#'identify several periodic phenomena with different frequencies. The estimated
#'frequencies could be divided by an appropriate constant to produce numbers of
#'cycles per second or day, or per meter or km, depending on the study.
#'
#'To find the \emph{period} (number of days, hours, etc.) of the process
#'generating a periodic signal in the data, divide the length of the series (in
#'days, hours, etc.) by the frequency identified by Dutilleul’s periodogram.
#'Recommendations to users The mfpa code estimates the periodic frequencies to
#'be included in the model through a combination of a stepwise procedure and
#'non-linear optimisation. Following that, the contributions of the ‘cos’ and
#'‘sin’ components of all frequencies in the model are estimated by multiple
#'linear regression in the presence of the intercept and trends (if any).
#'Because the mfpa method estimates fractional frequencies, the cos-sin
#'combinations are not orthogonal among the identified frequencies, and
#'unnecessary frequencies may be selected as ‘significant’.
#'
#'1. It is important that users of this periodogram have hypotheses in mind
#'about the frequencies of the processes that may be operating on the system
#'under study and the number of periodic components they are expecting to find.
#'If one asks for more components than the number of periodic phenomena at work
#'on the system, the ‘real’ frequency usually has a strong or fairly strong
#'R-squared and it is followed by other components with very small R-squared.
#'Selection of frequencies of interest should thus be based more upon
#'examination of the R-squares of the components rather than on the p-values.
#'For short series in particular, the adjusted R-squared is an unbiased estimate
#'of the variance of the data explained by the model. Even series of random
#'numbers can produce ‘significant’ frequencies for periodic components; the
#'associated (adjusted) R-squares will, however, be very small.
#'
#'2. Function mfpa cannot detect frequencies < 1 (smaller than one cycle in the
#'series) or larger than (\emph{n}–1) where \emph{n} is the number of
#'observations in the series, the latter case corresponding to periods smaller
#'than the interval between successive observations. When a periodic component
#'with such a period is present in the data, Dutilleul’s periodogram can detect
#'harmonics of that frequency. Recommendation: when a frequency is detected that
#'does not seem to correspond to a hypothesized process, one could check, using
#'simulated data, if it could be produced by a process operating at a temporal
#'scale (period) smaller than the interval between successive observations.  An
#'example is shown in Example 2.
#'
#'3. When analysing a time series with unknown periodic structure, it is
#'recommended to try first with more than one frequency, say 2 or 3, and also
#'with a trend. Eliminate the non-significant components, step by step, in
#'successive runs, starting with the trend(s), then eliminate the weakly
#'significant periodic components, until there are only highly significant
#'components left in the model.
#'
#'@return A list containing the following elements:
#'
#'  \itemize{ \item \code{frequencies}: Vector of estimated frequencies of the
#'  model periodic components and associated R-squared. The frequencies are
#'  numbers of cycles in the whole (temporal or spatial) series under study.
#'  \item \code{coefficients}: Data frame containing OLS slope estimates,
#'  starting with the intercept, then the orthogonal polynomials modelling trend
#'  in increasing order, followed by the cosine and sine coefficients
#'  (alternating) in the order of the estimated frequencies. Columns: (1)
#'  \code{coefficient}: the OLS intercept or slope estimates; (2) \code{prob}:
#'  the associated probabilities. \item \code{predicted}: A vector (length
#'  \emph{n}) of predicted response values (fitted values), including the trend
#'  if any. The data and predicted values can be plotted together using function
#'  \code{plot.mfpa}; type plot(name.of.output.object). The data values are
#'  represented by red circles and the fitted values by a black line. \item
#'  \code{auto_coeff}: If nlags > 0: data frame containing the following
#'  columns. (1) \code{lag}: lags at which autocorrelation analysis of the OLS
#'  residuals is performed; (2) \code{auto_r}: vector of sample autocorrelation
#'  coefficients calculated from OLS residuals for each lag; (3) \code{prob}:
#'  vector of probabilities associated with the tests of significance of the
#'  sample autocorrelation coefficients. \item \code{y}: the original data
#'  series (one-column matrix). \item \code{X}: the matrix of explanatory
#'  variables; it contains a column of “1” to estimate the intercept, a column
#'  for each of the trend components (if any), and two columns for each
#'  frequency component, each frequency being represented by a cosine and a
#'  sine. \item \code{r.squared.global}: The global R-squared of the model and
#'
#'@references
#'
#'Dutilleul, P. 1990. Apport en analyse spectrale d’un périodogramme modifié et
#'modélisation des séries chronologiques avec répétitions en vue de leur
#'comparaison en fréquence. Doctoral Dissertation, Université Catholique de
#'Louvain, Louvain-la-Neuve, Belgium.
#'
#'Dutilleul, P. 1998. Incorporating scale in ecological experiments: data
#'analysis. Pp. 387-425 in: D. L. Peterson & V. T. Parker [eds.] Ecological
#'scale – Theory and applications. Columbia University Press, New York.
#'
#'Dutilleul, P. 2001. Multi-frequential periodogram analysis and the detection
#'of periodic components in time series. Commun. Stat. - Theory Methods 30,
#'1063–1098.
#'
#'Dutilleul, P. R. L. 2011. Spatio-temporal heterogeneity — Concepts and
#'analyses. Cambridge University Press, Cambridge.
#'
#'Dutilleul, P. and C. Till. 1992. Evidence of periodicities related to climate
#'and planetary behaviors in ring-width chronologies of Atlas cedar (Cedrus
#'atlantica) in Morocco. Can. J. For. Res. 22: 1469-1482.
#'
#'Legendre, P. and P. Dutilleul. 1992. Introduction to the analysis of periodic
#'phenomena. 11-25 in: M. A. Ali [ed.] Rhythms in fishes. NATO ASI Series, Vol.
#'A-236. Plenum, New York.
#'
#'Legendre, P. and L. Legendre. 2012. Numerical Ecology. 3rd English edition.
#'Elsevier, Amsterdam.
#'
#'@author  Guillaume Larocque <[email protected]@gmail.com> and Pierre Legendre.
#'
#'@examples
#'
#' ### Example 1
#'
#' # Simulate data with frequencies 2.3 and 6.1 and a random component, n = 100.
#' # No trend, no autocorrelated residuals.
#'
#' y <- as.matrix(0.4*(sin(2.3*2*pi*(1:100)/100)) +
#' 0.4*(sin(6.1*2*pi*(1:100)/100)) + 0.2*rnorm(100))
#'
#' res <- mfpa(y, MaxNFreq = 2, MinFreq = 2, ntrend = 0, nlags = 0)
#'
#' # Compute the periods associated with the two periodic components. Each
#' # frequency in element $frequencies is a number of cycles in the whole series. #' # The periods are expressed in numbers of time intervals of the data series. In #' # this example, if the data are measured every min, the periods are in min. #' #' periods <- 100/res$frequencies$frequency #' #' # Draw the data series and the fitted (or predicted) values #' #' plot(res) #' #' ### Example 2 #' #' # Generate hourly periodic data with tide signal (tide period T = 12.42 h) #' # during 1 year, hence 24*365 = 8760 hourly data. See #' # https://en.wikipedia.org/wiki/Tide. #' #' # In this simulation, constant (c = 0) puts the maximum value of the cosine at #' # midnight on the first day of the series. #' #' periodic.component <- function(x, T, c) cos((2*pi/T)*(x+c)) #' #' tide.h <- periodic.component(1:8760, 12.42, 0) #' #' # The number of tides in the series is: 8760/12.42 = 705.314 tidal cycles #' # during one year. #' #' # Sample the hourly data series once a day at 12:00 noon every day. The #' # periodic signal to be detected has a period smaller then the interval between #' # consecutive observations and its frequency is larger than (n–1). The sequence #' # of sampling hours for the tide.h data is: #' #' h.noon <- seq(12, 8760, 24) #' tide.data <- tide.h[h.noon] #' length(tide.data) #' #' # The series contains 365 sampling units #' #' # Compute Dutilleul’s multi-frequential periodogram #' #' res.noon <- mfpa(tide.data, MaxNFreq = 1, MinFreq = 2, ntrend = 1, nlags = 2) #' #' # Examine the frequency detected by the periodogram, element #' # res.noon$frequencies. This is a harmonic of the tide signal in the original
#' # data series tide.h.
#'
#' # Compute the period of the signal in the data series sampled hourly:
#'
#' period <- 365/res.noon$frequencies$frequency
#'
#' # Draw the data series and the adjusted values
#'
#' plot(res.noon)
#'
#' # Repeat this analysis after addition of random noise to the tide data
#'
#' tide.noise <- tide.data + rnorm(365, 0, 0.25)
#'
#' res.noise <- mfpa(tide.noise, MaxNFreq = 1, MinFreq = 2, ntrend = 1, nlags = 2)
#'
#' plot(res.noise)
#'
#'@importFrom stats optim pnorm pt var
#'@importFrom utils tail
#'@export mfpa
#'

'mfpa' <-
function(y,
MaxNFreq = 2,
MinFreq = 3,
MaxFreq = NA,
ntrend = 0,
nlags = 0,
alpha = 0.05)
{
#### Internal functions

## Optimization to fine-tune the frequency estimation
mfpa_LSoptim <- function(x, y, mf, n, xt) {
for (i in c(1:mf)) {
xt <- cbind(xt, cos(2 * pi * x[i] * c(1:n) / n), sin(2 * pi * x[i] * c(1:n) /
n))
}
out <- -1 * c(t(y) %*% xt %*% solve(t(xt) %*% xt, t(xt) %*% y))
}

## Brute force estimation of frequencies
mfpa_LS <- function(y, X, freqs, mfreqs, ntrend) {
NF <- length(freqs)
n <- nrow(y)
ncX <- ncol(X)
IM <- matrix(0, NF, 1)
y <- as.matrix(y)
for (f1 in c(1:NF)) {
X[, ncX - 1] <- cos(2 * pi * freqs[f1] * c(1:n) / n)
X[, ncX] <- sin(2 * pi * freqs[f1] * c(1:n) / n)
#lower <- max(freqs[mfreqind-1],0), upper = freqs[mfreqind+1])
IM[f1] <-
tryCatch(
as.matrix(t(y) %*% X) %*% solve(t(X) %*% X, t(X) %*% y),
error = function(e)
e = 0
)
}
m <- max(IM)
mfreqind <- which.max(IM)
mf <- length(mfreqs) + 1
xt <- X[, 1:c(ntrend + 1)]
ctrl <- list(pgtol = 0, trace = 6)
mfreq <-
optim(
c(mfreqs, freqs[mfreqind]),
mfpa_LSoptim,
y = y,
mf = mf,
n = n,
xt = xt,
method = "BFGS"
)
out <- list(m = m, mfreq = mfreq$par) } ## Compute adjusted R-square from R-square adj.R2 <- function(R2, n, m) 1 - (1 - R2) * (n - 1) / (n - m - 1) #### END Internal functions y <- as.matrix(y) n <- nrow(y) if (is.na(MaxFreq)) { # The maximum frequency value (cycles) is equal to one fourth of the # time series length. MaxF <- n / 4 } else{ if (MaxFreq < MinFreq | MaxFreq > (n / 2)) { stop( 'Invalid maximum frequency (MaxFreq). It must be larger than or equal to MinFreq AND smaller than or equal to half the number of observations in the series' ) } else{ MaxF = MaxFreq } } res <- 0.5 freqs <- seq(MinFreq, MaxF, by = res) IM <- matrix(0, 2, 1) TSS <- sum(y * y) alpha <- 0.1 mfreqs <- NULL for (i in 1:MaxNFreq) { if (i == 1) { # Definition of trend components X <- matrix(0, n, 2 + 1 + ntrend) if (ntrend > -1) X[, 1] <- matrix(1, n, 1) if (ntrend > 0) { t1 <- c(1:n) X[, 2] <- t1 - sum(t1) / n } if (ntrend > 1) { t2 <- c(1:n) ^ 2 n2 <- sum(abs(X[, 2]) ^ 2) ^ (1 / 2) X[, 3] <- t2 - sum(t2) / n - sum(X[, 2] * t2) * X[, 2] / n2 ^ 2 } if (ntrend > 2) { t3 <- c(1:n) ^ 3 n3 <- sum(abs(X[, 3]) ^ 2) ^ (1 / 2) X[, 4] = t3 - sum(t3) / n - sum(X[, 2] * t3) * X[, 2] / n2 ^ 2 - sum(X[, 3] * t3) * X[, 3] / n3 ^ 2 } if (ntrend > 3) { stop('The number of orthogonal polynomials must be 3 or less.') } XP <- X yp <- 0 } else{ X <- cbind(X, matrix(0, n, 2)) # Frequencies are added in a stepwise procedure. } LSRes <- mfpa_LS(y, X, freqs, mfreqs, ntrend) IM[2] <- LSRes$m
mfreq <- LSRes$mfreq if (i > 1) { F <- (0.5 * (IM[2] - IM[1])) / ((TSS - IM[2]) / (n - 2 * i - 1 - ntrend)) p <- pf(F, 2, (n - 2 * i - 1 - ntrend), lower.tail = FALSE) if (p > alpha) { break } else{ IM[1] <- IM[2] mfreqs <- mfreq } } else{ IM[1] <- IM[2] mfreqs <- mfreq } X[, ncol(X) - 1] <- cos(2 * pi * round(tail(mfreq, 1) * 10) / 10 * c(1:n) / n) X[, ncol(X)] <- sin(2 * pi * round(tail(mfreq, 1) * 10) / 10 * c(1:n) / n) } X <- X[, !colSums(abs(X)) == 0] A <- solve(t(X) %*% X, t(X)) b <- A %*% y # Regression slopes yp <- X %*% b # Predicted values f.r.squared = matrix(NA, length(mfreqs)) for (i in 1:length(mfreqs)) { ind <- c(ntrend + (2 * i), ntrend + 1 + (2 * i)) Af <- solve(t(X[, -ind]) %*% X[, -ind], t(X[, -ind])) bf <- Af %*% y # Regression slopes ypf <- X[, -ind] %*% bf # Predicted values f.r.squared[i] <- (crossprod(yp) - crossprod(ypf)) / crossprod(y) } yres <- y - yp # Residuals MSE <- sum(yres ^ 2) / (n - qr(X)$rank)
VarB <- diag(A %*% t(A) * MSE)
S <- sqrt(VarB)
tstat <- b / S
Probb <- (1 - pt(abs(tstat), n - ncol(X))) * 2
r.squared.global <- vector(length = 2)
r.squared.global[1] <- Rsq <- var(yp) / var(y)
r.squared.global[2] <- adj.R2(Rsq, n, (ncol(X) - 1))

if (nlags > 0) {
# Definition of lags
lags <- c(1:nlags)
nlags <- length(lags)
# Autocorrelation analysis of OLS residuals
HH <-
dist(cbind(c(1:n), matrix(1, n, 1)), upper = TRUE, diag = TRUE)
yMat <- kronecker(matrix(1, 1, n), yres) - mean(yres)
MSE <- sum(yres * yres) / (n - qr(X)$rank) if (nlags != 0) { r <- matrix(0, nlags - 1, 1) S2 <- matrix(0, nlags - 1, 1) yProd <- yMat * t(yMat) for (i in 1:nlags) { islag <- as.vector(as.matrix(HH)) == lags[i] tMat <- as.vector(yProd) * islag r[i] <- sum(tMat) / (2 * (n - 1) * var(yres)) S2[i] <- (1 + 2 * sum(r[1:(i - 1)] ^ 2)) / n } Z <- r / sqrt(S2) Probr <- 2 * (1 - pnorm(abs(Z), mean = 0, sd = 1)) } else{ Z <- 0 } } else{ lags <- numeric() r <- numeric() Probr <- numeric() } cresults <- data.frame(coefficient = b, prob = round(Probb, 5)) nsc <- (ncol(X) - ntrend - 1) / 2 if (ntrend != 0) { rt <- paste(rep("trend order", ntrend), 1:ntrend) } else{ rt <- { } } row.names(cresults) <- c('intercept', rt, paste(rep(c('cos', 'sin'), nsc), kronecker(1:(nsc), c(1, 1)))) freqresults <- data.frame(frequency = mfreqs, r.squared = f.r.squared) auto_coeff <- data.frame(lag = lags, auto_r = r, prob = Probr) out <- list( frequencies = freqresults, coefficients = cresults, predicted = yp, auto_coeff = auto_coeff, y = y, X = X, r.squared.global = r.squared.global ) class(out) <- c("mfpa", class(res)) return(out) } #' @rdname mfpa #' @export 'plot.mfpa' <- function (x, xlab = "" , ylab = "Values" , ...) { plot( x$y,
type = "p",
col = "red",
ylim = range(x$y) * 1.1, ylab = ylab, xlab = xlab ) lines(x$predicted)
}

#' @rdname mfpa
#' @export
'print.mfpa' <-
function (x, ...) {
cat("Multi-frequential periodogram analysis\n\n")
print(x$frequencies) cat("\n") cat(paste0( "Global R-squared : ", format(x$r.squared.global[1], digits = 5)
), "\n")
cat(paste0(
format(x$r.squared.global[2], digits = 5) ), "\n") cat("\n") print(x$coefficients)
if (length(x$lags) > 0) { cat("\n") print(x$auto_coeff)