Nothing
#'Multi-frequential periodogram analysis
#'
#'This function performs multi-frequential periodogram analysis for univariate
#'temporal or spatial data series collected with equal intervals. Compared with
#'the traditional periodogram used in spectral analysis, this method can detect
#'overlapping signals with fractional frequencies. Fitting a joint
#'polynomial-trigonometric model is achieved by Ordinary Least Squares (OLS)
#'regression. The function also performs autocorrelation analysis of OLS
#'residuals up to a number of lags determined by the user.
#'
#'@aliases mfpa plot.mfpa print.mfpa
#'
#'@param y Vector of \emph{n} observations (vector of integer or real numbers,
#' or one-column matrix).
#'@param MaxNFreq Maximum number of frequencies to be estimated in the stepwise
#' procedure (e.g. 2).
#'@param MinFreq Minimum value for frequency estimates (e.g. 3.0).
#'@param MaxFreq Maximum value for frequency estimates (e.g. 10). Must be larger
#' than MinFreq and smaller than half of the number of observations in the
#' series. If unspecified by the user \code{(MaxFreq=NA)}, \code{MaxFreq} is
#' set to \emph{n}/4 by the function.
#'@param ntrend Number (0 to 3) of orthogonal polynomial components estimating
#' the broad-scale trend, to be included in the joint polynomial-trigonometric
#' model. Use 0 to estimate no trend component, only an intercept.
#'@param nlags Number of lags to be used for autocorrelation analysis of OLS
#' residuals. Use 0 to bypass this analysis.
#'@param alpha Significance threshold for including frequencies.
#'@param x An object of class \code{mfpa}
#'@param xlab,ylab Labels for x and y axes
#'@param \dots Further arguments passed to or from other methods
#'
#'@details
#'
#'The fitting of a joint polynomial-trigonometric model is limited to ordinary
#'least squares (OLS), with autocorrelation analysis of OLS residuals up to a
#'certain lag. Orthogonal polynomials are used to model broad-scale trends,
#'whereas cosines and sines model the periodic structures at intermediate
#'scales. See Dutilleul (2011, section 6.5) and Legendre & Legendre (2012,
#'section 12.4.4) for details. OLS regression could be replaced by an
#'\emph{estimated generalized least squares} (EGLS) procedure, as described in
#'Dutilleul (2011).
#'
#'In spectral analysis in general and in mfpa in particular, the cosines and
#'sines are considered jointly in the search for the dominant frequency
#'components since they are both required to fully account for a frequency
#'component in a linear model. So, when either the cosine or the sine is
#'significant, this is sufficient indication that a significant frequency
#'component has been found. But see the first paragraph of the 'Recommendations
#'to users' below.
#'
#'The periodic phenomenon corresponding to each identified frequency is modelled
#'by a cosine and a sine. The first pair ('cos 1', 'sin 1') corresponds to the
#'first frequency, the second pair to the second frequency, and so on. An
#'intercept is also computed, as well as a polynomial broad-scale trend if
#'argument ntrend > 0. The coefficients shown for each periodic component ('cos'
#'and 'sin') are the OLS regression coefficients. The tests of significance
#'producing the p-values (called 'prob' in the output file) are 2-tailed
#'parametric t-tests, as in standard OLS regression.
#'
#'A global R-square statistic for the periodogram is computed as the variance of
#'the fitted values divided by the variance of the data series. An R-squared
#'corresponding to each frequency is also returned.
#'
#'In the Dutilleul periodogram, the time unit is the length of the data series
#'(in time units: seconds, hours, days, etc.). Hence, the \emph{frequency}
#'identified by a Dutilleul periodogram is the number of cycles of the periodic
#'signal (how many full or partial cycles) along the time series. That number is
#'an integer when the series contains an integer number of cycles; it may also
#'be a real number when the number of cycles is fractional. The periodogram can
#'identify several periodic phenomena with different frequencies. The estimated
#'frequencies could be divided by an appropriate constant to produce numbers of
#'cycles per second or day, or per meter or km, depending on the study.
#'
#'To find the \emph{period} (number of days, hours, etc.) of the process
#'generating a periodic signal in the data, divide the length of the series (in
#'days, hours, etc.) by the frequency identified by Dutilleul's periodogram.
#'
#'Recommendations to users The mfpa code estimates the periodic frequencies to
#'be included in the model through a combination of a stepwise procedure and
#'non-linear optimisation. Following that, the contributions of the 'cos' and
#''sin' components of all frequencies in the model are estimated by multiple
#'linear regression in the presence of the intercept and trends (if any).
#'Because the mfpa method estimates fractional frequencies, the cos-sin
#'combinations are not orthogonal among the identified frequencies, and
#'unnecessary frequencies may be selected as 'significant'.
#'
#'1. It is important that users of this periodogram have hypotheses in mind
#'about the frequencies of the processes that may be operating on the system
#'under study and the number of periodic components they are expecting to find.
#'If one asks for more components than the number of periodic phenomena at work
#'on the system, the 'real' frequency usually has a strong or fairly strong
#'R-squared and it is followed by other components with very small R-squared.
#'Selection of frequencies of interest should thus be based more upon
#'examination of the R-squares of the components rather than on the p-values.
#'For short series in particular, the adjusted R-squared is an unbiased estimate
#'of the variance of the data explained by the model. Even series of random
#'numbers can produce 'significant' frequencies for periodic components; the
#'associated (adjusted) R-squares will, however, be very small.
#'
#'2. Function mfpa cannot detect frequencies < 1 (smaller than one cycle in the
#'series) or larger than (\emph{n}-1) where \emph{n} is the number of
#'observations in the series, the latter case corresponding to periods smaller
#'than the interval between successive observations. When a periodic component
#'with such a period is present in the data, Dutilleul's periodogram can detect
#'harmonics of that frequency. Recommendation: when a frequency is detected that
#'does not seem to correspond to a hypothesized process, one could check, using
#'simulated data, if it could be produced by a process operating at a temporal
#'scale (period) smaller than the interval between successive observations. An
#'example is shown in Example 2.
#'
#'3. When analysing a time series with unknown periodic structure, it is
#'recommended to try first with more than one frequency, say 2 or 3, and also
#'with a trend. Eliminate the non-significant components, step by step, in
#'successive runs, starting with the trend(s), then eliminate the weakly
#'significant periodic components, until there are only highly significant
#'components left in the model.
#'
#'@return A list containing the following elements:
#'
#' \itemize{ \item \code{frequencies}: Vector of estimated frequencies of the
#' model periodic components and associated R-squared. The frequencies are
#' numbers of cycles in the whole (temporal or spatial) series under study.
#' \item \code{coefficients}: Data frame containing OLS slope estimates,
#' starting with the intercept, then the orthogonal polynomials modelling trend
#' in increasing order, followed by the cosine and sine coefficients
#' (alternating) in the order of the estimated frequencies. Columns: (1)
#' \code{coefficient}: the OLS intercept or slope estimates; (2) \code{prob}:
#' the associated probabilities. \item \code{predicted}: A vector (length
#' \emph{n}) of predicted response values (fitted values), including the trend
#' if any. The data and predicted values can be plotted together using function
#' \code{plot.mfpa}; type plot(name.of.output.object). The data values are
#' represented by red circles and the fitted values by a black line. \item
#' \code{auto_coeff}: If nlags > 0: data frame containing the following
#' columns. (1) \code{lag}: lags at which autocorrelation analysis of the OLS
#' residuals is performed; (2) \code{auto_r}: vector of sample autocorrelation
#' coefficients calculated from OLS residuals for each lag; (3) \code{prob}:
#' vector of probabilities associated with the tests of significance of the
#' sample autocorrelation coefficients. \item \code{y}: the original data
#' series (one-column matrix). \item \code{X}: the matrix of explanatory
#' variables; it contains a column of "1" to estimate the intercept, a column
#' for each of the trend components (if any), and two columns for each
#' frequency component, each frequency being represented by a cosine and a
#' sine. \item \code{r.squared.global}: The global R-squared of the model and
#' the adjusted R-squared. }
#'
#'@references
#'
#'Dutilleul, P. 1990. Apport en analyse spectrale d'un périodogramme modifié et
#'modélisation des séries chronologiques avec répétitions en vue de leur
#'comparaison en fréquence. Doctoral Dissertation, Université Catholique de
#'Louvain, Louvain-la-Neuve, Belgium.
#'
#'Dutilleul, P. 1998. Incorporating scale in ecological experiments: data
#'analysis. Pp. 387-425 in: D. L. Peterson & V. T. Parker [eds.] Ecological
#'scale - Theory and applications. Columbia University Press, New York.
#'
#'Dutilleul, P. 2001. Multi-frequential periodogram analysis and the detection
#'of periodic components in time series. Commun. Stat. - Theory Methods 30,
#'1063-1098.
#'
#'Dutilleul, P. R. L. 2011. Spatio-temporal heterogeneity - Concepts and
#'analyses. Cambridge University Press, Cambridge.
#'
#'Dutilleul, P. and C. Till. 1992. Evidence of periodicities related to climate
#'and planetary behaviors in ring-width chronologies of Atlas cedar (Cedrus
#'atlantica) in Morocco. Can. J. For. Res. 22: 1469-1482.
#'
#'Legendre, P. and P. Dutilleul. 1992. Introduction to the analysis of periodic
#'phenomena. 11-25 in: M. A. Ali [ed.] Rhythms in fishes. NATO ASI Series, Vol.
#'A-236. Plenum, New York.
#'
#'Legendre, P. and L. Legendre. 2012. Numerical Ecology. 3rd English edition.
#'Elsevier, Amsterdam.
#'
#'@author Guillaume Larocque <glaroc@@gmail.com> and Pierre Legendre.
#'
#'@examples
#'
#' ### Example 1
#'
#' # Simulate data with frequencies 2.3 and 6.1 and a random component, n = 100.
#' # No trend, no autocorrelated residuals.
#'
#' y <- as.matrix(0.4*(sin(2.3*2*pi*(1:100)/100)) +
#' 0.4*(sin(6.1*2*pi*(1:100)/100)) + 0.2*rnorm(100))
#'
#' res <- mfpa(y, MaxNFreq = 2, MinFreq = 2, ntrend = 0, nlags = 0)
#'
#' # Compute the periods associated with the two periodic components. Each
#' # frequency in element $frequencies is a number of cycles in the whole series.
#' # The periods are expressed in numbers of time intervals of the data series. In
#' # this example, if the data are measured every min, the periods are in min.
#'
#' periods <- 100/res$frequencies$frequency
#'
#' # Draw the data series and the fitted (or predicted) values
#'
#' plot(res)
#'
#' ### Example 2
#'
#' # Generate hourly periodic data with tide signal (tide period T = 12.42 h)
#' # during 1 year, hence 24*365 = 8760 hourly data. See
#' # https://en.wikipedia.org/wiki/Tide.
#'
#' # In this simulation, constant (c = 0) puts the maximum value of the cosine at
#' # midnight on the first day of the series.
#'
#' periodic.component <- function(x, T, c) cos((2*pi/T)*(x+c))
#'
#' tide.h <- periodic.component(1:8760, 12.42, 0)
#'
#' # The number of tides in the series is: 8760/12.42 = 705.314 tidal cycles
#' # during one year.
#'
#' # Sample the hourly data series once a day at 12:00 noon every day. The
#' # periodic signal to be detected has a period smaller then the interval between
#' # consecutive observations and its frequency is larger than (n-1). The sequence
#' # of sampling hours for the tide.h data is:
#'
#' h.noon <- seq(12, 8760, 24)
#' tide.data <- tide.h[h.noon]
#' length(tide.data)
#'
#' # The series contains 365 sampling units
#'
#' # Compute Dutilleul's multi-frequential periodogram
#'
#' res.noon <- mfpa(tide.data, MaxNFreq = 1, MinFreq = 2, ntrend = 1, nlags = 2)
#'
#' # Examine the frequency detected by the periodogram, element
#' # res.noon$frequencies. This is a harmonic of the tide signal in the original
#' # data series tide.h.
#'
#' # Compute the period of the signal in the data series sampled hourly:
#'
#' period <- 365/res.noon$frequencies$frequency
#'
#' # Draw the data series and the adjusted values
#'
#' plot(res.noon)
#'
#' # Repeat this analysis after addition of random noise to the tide data
#'
#' tide.noise <- tide.data + rnorm(365, 0, 0.25)
#'
#' res.noise <- mfpa(tide.noise, MaxNFreq = 1, MinFreq = 2, ntrend = 1, nlags = 2)
#'
#' plot(res.noise)
#'
#'@importFrom stats optim pnorm pt var
#'@importFrom utils tail
#'@export mfpa
#'
'mfpa' <-
function(y,
MaxNFreq = 2,
MinFreq = 3,
MaxFreq = NA,
ntrend = 0,
nlags = 0,
alpha = 0.05)
{
#### Internal functions
## Optimization to fine-tune the frequency estimation
mfpa_LSoptim <- function(x, y, mf, n, xt) {
for (i in c(1:mf)) {
xt <- cbind(xt, cos(2 * pi * x[i] * c(1:n) / n), sin(2 * pi * x[i] * c(1:n) /
n))
}
out <- -1 * c(t(y) %*% xt %*% solve(t(xt) %*% xt, t(xt) %*% y))
}
## Brute force estimation of frequencies
mfpa_LS <- function(y, X, freqs, mfreqs, ntrend) {
NF <- length(freqs)
n <- nrow(y)
ncX <- ncol(X)
IM <- matrix(0, NF, 1)
y <- as.matrix(y)
for (f1 in c(1:NF)) {
X[, ncX - 1] <- cos(2 * pi * freqs[f1] * c(1:n) / n)
X[, ncX] <- sin(2 * pi * freqs[f1] * c(1:n) / n)
#lower <- max(freqs[mfreqind-1],0), upper = freqs[mfreqind+1])
IM[f1] <-
tryCatch(
as.matrix(t(y) %*% X) %*% solve(t(X) %*% X, t(X) %*% y),
error = function(e)
e = 0
)
}
m <- max(IM)
mfreqind <- which.max(IM)
mf <- length(mfreqs) + 1
xt <- X[, 1:c(ntrend + 1)]
ctrl <- list(pgtol = 0, trace = 6)
mfreq <-
optim(
c(mfreqs, freqs[mfreqind]),
mfpa_LSoptim,
y = y,
mf = mf,
n = n,
xt = xt,
method = "BFGS"
)
out <- list(m = m, mfreq = mfreq$par)
}
## Compute adjusted R-square from R-square
adj.R2 <- function(R2, n, m)
1 - (1 - R2) * (n - 1) / (n - m - 1)
#### END Internal functions
y <- as.matrix(y)
n <- nrow(y)
if (is.na(MaxFreq)) {
# The maximum frequency value (cycles) is equal to one fourth of the
# time series length.
MaxF <- n / 4
} else{
if (MaxFreq < MinFreq | MaxFreq > (n / 2)) {
stop(
'Invalid maximum frequency (MaxFreq). It must be larger than or equal to
MinFreq AND smaller than or equal to half the number of observations in the series'
)
} else{
MaxF = MaxFreq
}
}
res <- 0.5
freqs <- seq(MinFreq, MaxF, by = res)
IM <- matrix(0, 2, 1)
TSS <- sum(y * y)
alpha <- 0.1
mfreqs <- NULL
for (i in 1:MaxNFreq) {
if (i == 1) {
# Definition of trend components
X <- matrix(0, n, 2 + 1 + ntrend)
if (ntrend > -1)
X[, 1] <- matrix(1, n, 1)
if (ntrend > 0) {
t1 <- c(1:n)
X[, 2] <- t1 - sum(t1) / n
}
if (ntrend > 1) {
t2 <- c(1:n) ^ 2
n2 <- sum(abs(X[, 2]) ^ 2) ^ (1 / 2)
X[, 3] <- t2 - sum(t2) / n - sum(X[, 2] * t2) * X[, 2] /
n2 ^ 2
}
if (ntrend > 2) {
t3 <- c(1:n) ^ 3
n3 <- sum(abs(X[, 3]) ^ 2) ^ (1 / 2)
X[, 4] = t3 - sum(t3) / n - sum(X[, 2] * t3) * X[, 2] /
n2 ^ 2 - sum(X[, 3] * t3) * X[, 3] / n3 ^ 2
}
if (ntrend > 3) {
stop('The number of orthogonal polynomials must be 3 or less.')
}
XP <- X
yp <- 0
} else{
X <-
cbind(X, matrix(0, n, 2)) # Frequencies are added in a stepwise procedure.
}
LSRes <- mfpa_LS(y, X, freqs, mfreqs, ntrend)
IM[2] <- LSRes$m
mfreq <- LSRes$mfreq
if (i > 1) {
F <- (0.5 * (IM[2] - IM[1])) / ((TSS - IM[2]) / (n - 2 * i - 1 - ntrend))
p <- pf(F, 2, (n - 2 * i - 1 - ntrend), lower.tail = FALSE)
if (p > alpha) {
break
} else{
IM[1] <- IM[2]
mfreqs <- mfreq
}
} else{
IM[1] <- IM[2]
mfreqs <- mfreq
}
X[, ncol(X) - 1] <-
cos(2 * pi * round(tail(mfreq, 1) * 10) / 10 * c(1:n) / n)
X[, ncol(X)] <-
sin(2 * pi * round(tail(mfreq, 1) * 10) / 10 * c(1:n) / n)
}
X <- X[, !colSums(abs(X)) == 0]
A <- solve(t(X) %*% X, t(X))
b <- A %*% y # Regression slopes
yp <- X %*% b # Predicted values
f.r.squared = matrix(NA, length(mfreqs))
for (i in 1:length(mfreqs)) {
ind <- c(ntrend + (2 * i), ntrend + 1 + (2 * i))
Af <- solve(t(X[, -ind]) %*% X[, -ind], t(X[, -ind]))
bf <- Af %*% y # Regression slopes
ypf <- X[, -ind] %*% bf # Predicted values
f.r.squared[i] <-
(crossprod(yp) - crossprod(ypf)) / crossprod(y)
}
yres <- y - yp # Residuals
MSE <- sum(yres ^ 2) / (n - qr(X)$rank)
VarB <- diag(A %*% t(A) * MSE)
S <- sqrt(VarB)
tstat <- b / S
Probb <- (1 - pt(abs(tstat), n - ncol(X))) * 2
r.squared.global <- vector(length = 2)
r.squared.global[1] <- Rsq <- var(yp) / var(y)
r.squared.global[2] <- adj.R2(Rsq, n, (ncol(X) - 1))
if (nlags > 0) {
# Definition of lags
lags <- c(1:nlags)
nlags <- length(lags)
# Autocorrelation analysis of OLS residuals
HH <-
dist(cbind(c(1:n), matrix(1, n, 1)), upper = TRUE, diag = TRUE)
yMat <- kronecker(matrix(1, 1, n), yres) - mean(yres)
MSE <- sum(yres * yres) / (n - qr(X)$rank)
if (nlags != 0) {
r <- matrix(0, nlags - 1, 1)
S2 <- matrix(0, nlags - 1, 1)
yProd <- yMat * t(yMat)
for (i in 1:nlags) {
islag <- as.vector(as.matrix(HH)) == lags[i]
tMat <- as.vector(yProd) * islag
r[i] <- sum(tMat) / (2 * (n - 1) * var(yres))
S2[i] <- (1 + 2 * sum(r[1:(i - 1)] ^ 2)) / n
}
Z <- r / sqrt(S2)
Probr <- 2 * (1 - pnorm(abs(Z), mean = 0, sd = 1))
} else{
Z <- 0
}
} else{
lags <- numeric()
r <- numeric()
Probr <- numeric()
}
cresults <- data.frame(coefficient = b, prob = round(Probb, 5))
nsc <- (ncol(X) - ntrend - 1) / 2
if (ntrend != 0) {
rt <- paste(rep("trend order", ntrend), 1:ntrend)
} else{
rt <- {
}
}
row.names(cresults) <-
c('intercept', rt, paste(rep(c('cos', 'sin'), nsc), kronecker(1:(nsc), c(1, 1))))
freqresults <-
data.frame(frequency = mfreqs, r.squared = f.r.squared)
auto_coeff <- data.frame(lag = lags,
auto_r = r,
prob = Probr)
out <-
list(
frequencies = freqresults,
coefficients = cresults,
predicted = yp,
auto_coeff = auto_coeff,
y = y,
X = X,
r.squared.global = r.squared.global
)
class(out) <- c("mfpa", class(res))
return(out)
}
#' @rdname mfpa
#' @export
'plot.mfpa' <-
function (x,
xlab = "" ,
ylab = "Values" ,
...) {
plot(
x$y,
type = "p",
col = "red",
ylim = range(x$y) * 1.1,
ylab = ylab,
xlab = xlab
)
lines(x$predicted)
}
#' @rdname mfpa
#' @export
'print.mfpa' <-
function (x, ...) {
cat("Multi-frequential periodogram analysis\n\n")
print(x$frequencies)
cat("\n")
cat(paste0(
"Global R-squared : ",
format(x$r.squared.global[1], digits = 5)
), "\n")
cat(paste0(
"Adjusted R-squared : ",
format(x$r.squared.global[2], digits = 5)
), "\n")
cat("\n")
print(x$coefficients)
if (length(x$lags) > 0) {
cat("\n")
print(x$auto_coeff)
}
}
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.