Nothing
#' Lomb-Scargle Periodigram
#'
#' The Lomb-Scargle periodigram represents a statistical estimator for the
#' amplitude and phase at a given frequency. This function takes also multivariate
#' (n-dimensional) input data.
#'
#' Since the given time series does not need to be evenly sampled, the data
#' mainly consists of data pairs \code{x1, x2, x3, ...} (sampling points) and (one)
#' corresponding value \code{y}, which stores the realisation/measurement data.
#' As can be seen from the data definition above, multivariate (n-dimensional)
#' input data is allowed and properly processed.
#'
#' Two different methods are implemented: the standard Lomb-Scargle method with
#'
#' \eqn{y(t) = a * cos(\omega (t - \tau)) + b * sin(\omega (t - \tau))}
#'
#' as model function and the generalized Lomb-Scargle (after Zechmeister 2009)
#' method with
#'
#' \eqn{y(t) = a * cos(\omega t) + b * sin(\omega t) + c}
#'
#' as model function, which investigates a floating average parameter \eqn{c}
#' as well.
#'
#' Both methods can be supplied by an artifical dense frequency vector \code{f}.
#' In conjunction with the resulting phase information the user might be able to
#' build a "Fourier"-like spectrum to reconstruct or interpolate the timeseries in equally
#' spaced sampling. Remind the band limitation which must be fulfilled for this.
#'
#' \describe{
#' \item{f}{The frequencies should be stored in a 1D vector or -- in case of
#' multivariate analysis -- in a \code{data.frame} structure to preserve variable names}
#' \item{\code{ofac}}{If the user does not provide a corresponding frequency
#' vector, the \code{ofac} parameter causes the function to estimate
#' \deqn{nf = ofac*length(x)/2} equidistant frequencies.}
#' \item{\code{p}-value}{The \code{p}-value (aka false alarm probability FAP)
#' gives the probability, wheter the estimated amplitude is NOT significant.
#' However, if \code{p} tends to zero the
#' amplidutde is significant. The user must decide which maximum value is acceptable,
#' until an amplitude is not valid.}
#' }
#'
#' If missing values \code{NA} or \code{NaN} appear in any column, the corresponding row
#' is excluded from calculation.
#'
#' @section Speed Up:
#'
#' In general the function calculates everything in a vectorized manner, which
#' speeds up the procedure. If the memory requirement is more than \code{maxMem},
#' the calculation is split into chunks which fit in the memory (cache). Depending on the
#' problem size (number of frequencies and data size) a tuning of this value
#' enhances speed.
#'
#' Please consider to replpace the BLAS library by a multithreaded version. For example
#' \url{https://prs.ism.ac.jp/~nakama/SurviveGotoBLAS2/binary/windows/x64/} is hosting
#' some Windows RBlas.dll files. Refer to \url{https://mattstats.wordpress.com/2016/02/07/r-with-gotoblas-on-windows-10/}
#' for further information.
#'
#' The parameter \code{cl} controls a possible cluster, which can be invoked. It takes an
#' integer number of workers (i. e. \code{cl = 4}), a list with node names \code{c("localhost",...)}
#' or an object of class \code{'cluster'} or similar. The first two options cause the
#' function to create the cluster internally. This takes time due to the initialization.
#' The faster way is to provide an already initialized cluster to the function.
#'
#'
#' @param f optional frequency vector / data frame. If not supplied \code{f} is calculated.
#' @param x sampling vector or data frame \code{data.frame(x1, x2, x3, ...)}
#' @param y input data vector or data frame \code{data.frame(x1, x2, ..., val)}
#' @param ofac in case \code{f=NULL} this value controlls the amount of frequency
#' oversampling.
#' @param w weights for data. It must be a 1D vector.
#' @param mode \code{"normal"} calculates the normal Lomb-Scargle periodogram;
#' \code{"generalized"} calculates the generalized Lomb-Scargle periodogram including
#' floating average and weights.
#' @param maxMem sets the amount of memory (in MB) to utilize, as a rough approximate.
#' @param cl if \code{numeric}, it defines the number of workers to use, or provides a cluster
#' definition of class \code{cluster} or \code{SocketCluster} from \code{parallel} package
#'
#' @return The \code{spec.lomb} function returns an object of the class \code{lomb},
#' which is a \code{list} containg the following information:
#' \describe{
#' \item{\code{A}}{A vector with amplitude spectrum}
#' \item{\code{f}}{corresponding frequency vector}
#' \item{\code{phi}}{phase vector}
#' \item{\code{PSD}}{power spectral density normalized to the sample variance}
#' \item{\code{floatAvg}}{floating average value only in case of
#' \code{mode == "generalized"}}
#' \item{\code{w}}{if, \code{mode == "generalized"} contains the weighting vector}
#' \item{\code{x,y}}{original data}
#' \item{p}{p-value False Alarm Probability}
#' }
#'
#' @references
#' A. Mathias, F. Grond, R. Guardans, D. Seese, M. Canela, H. H. Diebner,
#' and G. Baiocchi, "Algorithms for spectral analysis of irregularly sampled
#' time series", Journal of Statistical Software, 11(2), pp. 1--30, 2004.
#' @references
#' J. D. Scargle, "Studies in astronomical time series analysis. II - Statistical
#' aspects of spectral analysis of unevenly spaced data", The Astrophysical Journal,
#' 263, pp. 835--853, 1982.
#' @references
#' M. Zechmeister and M. Kurster, "The generalised Lomb-Scargle periodogram.
#' A new formalism for the floating-mean and Keplerian periodograms",
#' Astronomy & Astrophysics, 496(2), pp. 577--584, 2009.
#'
#' @concept Lomb Scargle
#' @concept Periodogram
#' @example R/examples/lombExample.R
#' @seealso \code{\link{filter.lomb}}
#' @export
#' @import lattice
#' @import RhpcBLASctl
#' @import parallel
#' @import pbapply
spec.lomb <- function(x = NULL, y=stop("Missing y-Value"),f = NULL, ofac = 1
,w = NULL, mode = "normal"
,maxMem = 8,cl = NULL
)
{
f_not_defined <- is.null(f)
fzero <- F
dat <- NULL
# put data into dataframe
if(!is.data.frame(y) & (is.null(x) || is.vector(x)))
{
# space normalized periodogram
if(is.null(x))
x <- 1:length(y)
if(!is.vector(x))
stop("x is not an atomic vector")
if(length(x)!=length(y)) stop("x-y lengths differ")
# coerce data to data frame with last (rightmost column) as values
dat <- data.frame(x = x, val = y)
}
if(!is.data.frame(y) & (is.data.frame(x)))
{
dat <- as.data.frame(cbind(x,y))
}
if(is.data.frame(y))
{
if(!is.null(x))
warning("ignoring x-values, because y is data.frame including depending variables")
dat <- y
}
names(dat)[dim(dat)[2]] <- "val"
### checking for weigtheing vector w
if(is.null(w) | length(w) == 1)
{
w <- rep(1,dim(dat)[1])
}
x_org <- x
y_org <- y
# delete "missing values"
c <- apply(dat,1,function(x) !any(is.na(x) )) # check by row
w <- w[c]
dat <- dat[c , ]
# force sum(w) == 1
w <- w/sum(w)
mean_val <- mean(dat$val)
var_val <- var(dat[,dim(dat)[2]])
nt <- dim(dat)[1]
### calculating frequencies
if(!is.null(f))
{
if( !( (is.vector(f) & is.vector(x) & is.vector(y)) |
(is.data.frame(f) & is.data.frame(y)) )
)
stop("frequency vector does not fit to data dimensions")
if( !is.data.frame(f) & length(f) == dim(dat)[2]-1 )
{
dim(f) <- c(1,length(f))
f <- as.data.frame(f)
names(f) <- paste("f",1:dim(f)[2],sep="")
warning("coerced frequency to 1xN data.frame")
}
if(ifelse(is.vector(f), length(f),dim(f)[1]) == (dim(dat)[2]-1))
{
f <- t(f)
warning("transposed frequency vector")
}
}
else # calculating the frequencies
{
fzero <- T
f <- apply(as.matrix(dat[,-dim(dat)[2]]),2,function(x)
{
# minimal frequency #
f0 <- 1/(2*diff(range(x)))
# Sampling frequency #
fsm <- length(x)^(1/(dim(dat)[2]-1)) / (diff(range(x)))
### calculation of frequencies ###
fmin <- ifelse(dim(dat)[2] == 2, f0, -0.5*fsm + f0)
return( seq(fmin,0.5*fsm - f0,by = f0/ofac) )
}
)
f <- as.data.frame(f)
f <- expand.grid(f)
names(f) <- ifelse(length(names(f)) == 1, "f", paste("f",1:dim(f)[2],sep=""))
}
# Lomb rechnen
env <- environment()
f <- as.matrix(f)
dat <- as.matrix(dat)
res <- NULL
# estimate the memory usage and divide the analysis in appropriate fractions
nFrac <- ceiling(maxMem * 1024^2 / (4 * 8 * dim(dat)[1]))
fInd <- split(1:dim(f)[1], rep(1:ceiling(dim(f)[1]/nFrac)
,each = nFrac
,length.out = dim(f)[1])
)
Y <- NULL
hYY <- NULL
if(mode == "generalized")
{
Y <- sum(w*dat[,dim(dat)[2]])
hYY <- sum(w*dat[,dim(dat)[2]]^2)
}
if(mode == "normal")
{
dat[,dim(dat)[2]] <- dat[,dim(dat)[2]] - mean_val
}
# making up progressbar
pboptions(style = 3, char = "=")
doLomb <- function(ind) # variables are present in the global environment
{
if(mode == "generalized")
{
return( gLmb(f[ind,],dat = dat,w = w,Y = Y,hYY = hYY) )
}
if(mode == "normal")
{
return( lmb(f[ind,],dat = dat,var_val = var_val) )
}
}
# if cluster or number of nodes is detected
# build up cluster
if(!is.null(cl))
{
if(is.numeric(cl) | is.character(cl)) # number of cores is given
{
on.exit(stopCluster(cl)) # stop own cluster if function ends
cl <- makeCluster(cl)
}
if(length(grep("cluster",class(cl))) == 0)
{
stop("Incorrect cluster file")
}
# correct BLAS number of threads
nthr <- get_num_procs() %/% length(cl)
if(nthr == 0L)
nthr <- 1
blas_set_num_threads(nthr)
clusterExport(cl,varlist = c("lmb","gLmb","dat","w","Y","hYY","var_val","mode","f")
,envir = env)
}
res <- pblapply(fInd,doLomb,cl = cl)
res <- do.call(rbind,res)
res <- cbind(f,res)
# column of a data frame is a list, so rbind() works better with a list
if(fzero)
{
res <- rbind(as.list(numeric(dim(res)[2])),res)
}
if(min(abs(rowSums(f))) == 0) # if we find f == 0 exatly
{
# correct f == 0 value
res$A[which(rowSums(f) == 0)] <- mean_val
res$phi[which(rowSums(f) == 0)] <- 0
res$PSD[which(rowSums(f) == 0)] <- 0
res$p[which(rowSums(f) == 0)] <- 0
}
# False alarm Probability
#
# based on the normalized PSD in [0,1] after Zechmeister 2009
#
# Prob(p > p0) = (1 - p0)^((N-3)/2)
#
Prob <- (1 - res$PSD)^((nt - 3) / 2)
# Degrees of freedom
# after Press (1989,1992)
# M <- nt/2
# after Horn and Baliunas (1986)
M <- -6.362 + 1.193*nt + 0.00098*(nt)^2
# minimize calculation errors at small p
res$p <- M*Prob
res$p[is.na(res$p)] <- 1
# 'large' p calculate like...
res$p[res$p > 0.001] <- 1 - ( 1 - Prob[res$p > 0.001])^M
res <- as.list(res)
res$w <- w
res$x <- x_org
res$y <- y_org
class(res) <- "lomb"
return(as.list(res))
}
#' Lomb-Scargle estimation function
#'
#' calculates the standard Lomb-Scargle estimation. The calculation is vectorized
#' to enhance calculation speed.
#'
#'
#' @param f frequency
#' @param dat spatial vector including locations and values
#' @param var_val variance of the data
lmb <- function(f, dat, var_val)
{
### Input definitions ###
#
# f = c(f1, f2, f3, ...)
# dat = data.frame(x1, x2, x3, ..., val)
#
#########################
# val <- dat[, dim(dat)[2]]
### Calculating phase (e.g. omega * time) ###
# if f is a single vector
if(is.null(dim(f)) & dim(dat)[2] > 2)
{
dim(f) <- c(1,length(f))
}
# matrix product takes sum over frequencies, if multivariate
SOT <- 2 * pi * tcrossprod(dat[, -dim(dat)[2]], f) # takes the matrix product
dSOT <- dim(SOT)
tau <- atan2( .colSums( sin(2 * SOT), dSOT[1], dSOT[2] )
, .colSums( cos(2 * SOT), dSOT[1], dSOT[2] )
) / 2
### berechne Argument der Summen ###
arg <- SOT - matrix(tau, nrow = dSOT[1], ncol = dSOT[2],byrow = T)
cs <- cos(arg)
ss <- sin(arg)
rm(SOT,arg)
### calculate individual parts ###
R <- .colSums(dat[,dim(dat)[2]] * cs, dSOT[1], dSOT[2])
I <- .colSums(dat[,dim(dat)[2]] * ss, dSOT[1], dSOT[2])
C <- .colSums(cs ^ 2, dSOT[1], dSOT[2])
S <- .colSums(ss ^ 2, dSOT[1], dSOT[2])
# plausibility test for cs = 1 and ss = 1
# prevent diff by 0 like error
C[which(C < 1e-20)] <- 1
S[which(S < 1e-20)] <- 1
l <- sqrt(R ^ 2 + I ^ 2)
phi <- -tau - atan2((I / l), (R / l))
A <- sqrt((R / C) ^ 2 + (I / S) ^ 2)
# A <- sqrt(2/dim(dat)[1]) * sqrt(R^2 / C + I^2 / S)
# normalized Periodogram (Hocke 1998, Press 1992 and Zechmeister 2009)
# Pn * pn = p(w)
PSD <- dim(dat)[1] / (4 * var_val) * A^2 * 2 / (dim(dat)[1] - 1)
return(data.frame(A = A, phi = phi, PSD = PSD))
}
#' generalized Lomb-Scargle estimation function
#'
#' calculates the generalized Lomb-Scargle estimation after Zechmeister et al. (2009)
#'
#' This method is based on the generalized approach
#'
#' \eqn{y(t) = a*cos(w*t) + b*sin(w*t) + c}
#'
#' which contains the floating average value \eqn{c} of the model function above.
#' The calculation is vectorized to enhance calculation
#' speed.
#'
#'
#' @param f frequency
#' @param dat spatial vector including locations and values
#' @param w vector of weights
#' @param Y weighted sum of values
#' @param hYY weighted sum of squared values
gLmb <- function(f, dat, w, Y, hYY)
{
### Input definitions ###
#
# f = c(f1, f2, f3, ...)
# dat = data.frame(x1, x2, x3, ..., val)
#
#########################
# val <- dat[, dim(dat)[2]]
# if f is a single vector
if(is.null(dim(f)) & dim(dat)[2] > 2)
{
dim(f) <- c(1,length(f))
}
### Calculating phase (e.g. omega * time) ###
# matrix product takes sum over frequencies, if multivariate
SOT <- 2 * pi * tcrossprod(dat[, -dim(dat)[2]],f) # takes the matrix product
dSOT <- dim(SOT)
## put the data into a Matrix
cSOT <- cos(SOT)
sSOT <- sin(SOT)
rm(SOT)
ms1 <- w * cSOT # C
ms2 <- w * sSOT # S
s1 <- .colSums(ms1,dSOT[1],dSOT[2])
s2 <- .colSums(ms2,dSOT[1],dSOT[2])
s3 <- .colSums( dat[, dim(dat)[2]] * ms1 ,dSOT[1],dSOT[2]) # YC
s4 <- .colSums( dat[, dim(dat)[2]] * ms2 ,dSOT[1],dSOT[2]) # YS
s5 <- .colSums( ms1 * cSOT ,dSOT[1],dSOT[2]) # CC
s6 <- .colSums( ms1 * sSOT ,dSOT[1],dSOT[2]) # CS
# s7 <- .colSums( ms2 * sSOT ,dSOT[1],dSOT[2]) # SS, ersetzt durch SS = 1 - CC
rm(cSOT,sSOT,ms1,ms2)
# Calculationg the coefficients
YY <- hYY - Y^2
YC <- s3 - Y*s1
YS <- s4 - Y*s2
CC <- s5 - s1^2
CS <- s6 - s1*s2
# SS <- s7 - s2^2 # 1-s[5] = SS = 1-CC
SS <- 1 - s5 - s2^2 # 1-s[5] = SS = 1-CC
D <- CC*SS - CS^2
a <- ( YC * SS - YS * CS ) / D
b <- ( YS * CC - YC * CS ) / D
c <- Y - a*s1 - b*s2
phi <- -atan2(b, a)
A <- sqrt(a ^ 2 + b ^ 2)
PSD <- 1/(YY*D)*(SS*YC^2 + CC*YS^2 - 2*CS*YC*YS)
return(data.frame(A = A, phi = phi, PSD = PSD,floatAvg = c))
}
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.