In CliffordLai/harper: Harmonic Regression and Periodicity Test

This vignette introduces the functionality of the harper package for estimating the frequency in the (generalzied) semiparametric harmonic regression model. As an example, we investigate the periodicity of the annual numbers of lynx trappings for 1821–1934 in Canada. This dataset is avaiable in R.

#----------Read Data----------#
y <- as.vector(lynx)
ly <- log(y)
n <- length(y)
t <- 1821:1934

par(mfrow=c(2,1))
plot(t,y,type = "b", xlab = "year",
     main = "lynx")
plot(t,ly,type = "b", xlab = "year",
     main="Log lynx")
par(mfrow=c(1,1))

Semiparametric Harmonic Regression Model

The proposed (generalzied) semiparametric harmonic regression model for a periodic series has the following form, $$E[y_t] = g( A \cos(2\pi\lambda t)+B \sin(2\pi\lambda t) ),\ t=1,...,n,$$ where $y_{t}$'s are indepedent, $g(\cdot)$ is an unknown strictly monotone function in the range $[0,1]$, $A^2 + B^2 =1$, $00$.

In this application, we assume that $y_t$'s follow either the Poisson distributions or Gaussian distributions with a constant variance.

Harmonic Regression

When fitting the harmonic regression model, the function $g(\cdot)$ becomes an identity function. We firstly estimate the frequency by the least-square methods on the log-transformed series. We then fit the series using the generalized linear model by specifying a Gaussian family and a Possion family, respectively.

#----------Fit the harmonic regression----------#
## Estimate the frequency (LS)
library(harper)
lambdaRange <- seq(1/n,0.5-1/n,length.out = 5*n)
system.time(
LSFit <- ptestReg(log(y),t=t,method = "LS",
                  lambdalist=lambdaRange,
                  returnPvalue = FALSE)
)
LSFit$freq

## Modeling given the estimated frequency
X <- data.frame(y=y,
                x1=cos(2*pi*LSFit$freq*t),
                x2=sin(2*pi*LSFit$freq*t))
fitHarG <- glm(y~x1+x2,data=X,family=gaussian())  #the Gaussian case
fitHarP <- glm(y~x1+x2,data=X,family=poisson())   #the Poisson case

## Fitted values at continuous time
newt <- seq(t[1],t[length(t)],0.1)
newX <- data.frame(x1=cos(2*pi*LSFit$freq*newt),
                   x2=sin(2*pi*LSFit$freq*newt))
predHarG <- predict(fitHarG,newX,type="response")
predHarP <- predict(fitHarP,newX,type="response")

The estimated frequency using the LS method is r LSFit$freq.

Semiparametric Harmonic Regression

When fitting the semiparametric harmonic regression model, the function $g(\cdot)$ needs to be estimated. We firstly estimate the frequency by the rank-based methods on the series. The log transformation is not neccessary because ranks are invariant under monotone transformations. We then fit the series the semiparametric harmonic regression with monotone generalzied additive model. As the harmonic regression model, we specify the distributions of $y_t$'s as a Gaussian family and a Possion family, respectively.

#----------Fit the semiparametric harmonic regression----------#
## Estimate the frequency (Rank-based)
phiRange <- seq(0,0.99,by=0.01) 
system.time(paraRankLS <- GetFitRankLS(y,t=t,lambdaRange,phiRange)) 
paraRankLS[2,1]

## Modeling given the estimated frequency
fitSHarG <- semihregScam(y, t, lambda0=paraRankLS[2,1],
                          family = gaussian(), bs = "mpd") #the Gaussian case
fitSHarP <- semihregScam(y, t, lambda0=paraRankLS[2,1],
                          family = poisson(), bs = "mpd") #the Poisson case

## Fitted values at continuous time
newt <- seq(t[1],t[length(t)],0.1)
predSHarG <- predict(fitSHarG,newt)[,2]
predSHarP <- predict(fitSHarP,newt)[,2]

The estimated frequency using the rank-based method is r paraRankLS[2,1], the same as the LS method.

Visualize the Fitted Model

Finnaly, we compare the fitted values from the above four models.

library(tidyverse)
nt <- length(newt)
LynxVis <- data.frame(t=rep(c(t,newt),4),
                      y=c(y,predHarG,y,predHarP,
                          y,predSHarG,y,predSHarP),
                      Method=rep(c("Harmonic(G)","Harmonic(P)",
                                   "Semi(G)","Semi(P)"),each=n+nt),
                      Type=rep(c(rep("observed",n),rep("fitted",nt)),4))
LynxVis$Type <- ordered(LynxVis$Type, levels=c("observed", "fitted"))
LynxVis$Method <- ordered(LynxVis$Method, levels=c("Harmonic(G)", "Semi(G)",
                                                   "Harmonic(P)","Semi(P)"))

ggplot(LynxVis, mapping=aes(x=t, y=y,color=Type)) +
  geom_point(aes(shape=Type))+
  geom_line(aes(linetype=Type),lwd=1) +
  scale_colour_manual(values=c("black", rgb(1,0,0,0.6) )) +
  scale_linetype_manual(values=c("blank", "solid")) +
  scale_shape_manual(values=c(20,NA)) +
  facet_grid(Method~.) +
  labs(x="Year",y="Lynx") +
  theme(legend.position="top")

Bootstrap

Below, we show how to obtain an estimated standard errors of the estimates via the bootstrap method. Since it takes time, we do not evaluate the scripts. The result below is based on a laptop with the cpu i7-4500U.

library(boot)
y <- as.vector(lynx)
n <- length(y)
t <- 1821:1934
lambdaRange <- seq(1/n,0.5-1/n,length.out = 5*n)
phiRange <- seq(0,0.99,by=0.02) 

## Estimation
freqfun1 <- function(d,inds,lambdaRange){
  LSFit <- ptestReg(d$y[inds],t=d$t[inds],method = "LS",
                    lambdalist=lambdaRange,
                    returnPvalue = FALSE)
  LSFit$freq
}

freqfun2 <- function(d,inds,lambdaRange,phiRange){
  paraRankLS <- GetFitRankLS(d$y[inds],t=d$t[inds],lambdaRange,phiRange)
  paraRankLS[2,]
}

## Bootstrap using multicores
library(parallel)
ncore <- 4 #detectCores() 
cl <- makeCluster(ncore) 
clusterExport(cl,"ptestReg")  
clusterExport(cl,"GetFitRankLS") 

## LS method
tc <- proc.time()
set.seed(193)
bfreqLS <- boot(data=data.frame(y=log(y),t=t), statistic=freqfun1, 
                R=500, sim = "ordinary",
                parallel="snow", ncpus=ncore,cl=cl,
                lambdaRange=lambdaRange)

proc.time()-tc #
# user  system elapsed 
# 0.84    0.18  138.17 
bfreqLS
# Bootstrap Statistics :
#   original       bias    std. error
# t1* 0.1037369 0.0002357167 0.0004685198


# Rank-based method
tc <- proc.time()
set.seed(193)
bfreqRank <- boot(data=data.frame(y=y,t=t), statistic=freqfun2, 
                R=500, sim = "ordinary",
                parallel="snow", ncpus=ncore,cl=cl,
                lambdaRange=lambdaRange)

proc.time()-tc #
# user  system elapsed 
# 6.47    0.32 1499.32 
bfreqRank
# Bootstrap Statistics :
#   original       bias     std. error
# t1* 0.1037369 0.0001305769 0.0005192553


stopCluster(cl = cl) #Close the clusters