# Sour

Functions for cross correlation of unevenly sampled time series.

Sour is an R package for estimating the cross-correlation between two time series which may be irregularly and/or non-simultaneously sampled. It is also a cocktail:

## Method

The CCF is computed using one of two methods: interpolated Cross Correlation Function (ICCF; Gaskell & Sparke 1986). the Discrete Correlation Function (DCF; Edelson & Krolik 1988)

You can also produce estimates of uncertainty on the CCF, its peak and centroid using the Flux Randomisation and Random Subsample Selection (FR/RSS) method of Peterson et al. (1998).

## Setting up

Sour is an R package, but is still in development. To set up from GitHub first install (if you haven't already) Hadley Wickham's fantastic devtools package.

``````   install.packages("devtools")
``````

Now you can install sour straight from GitHub:

``````   devtools::install_github("svdataman/sour")
``````

and you're good to go.

## Example data

The package comes with an example dataset. The data comprise two time series cont and hbeta. Each series is an array or data frame with columns t, y (and optionally) dy.

``````  plot(cont\$t, cont\$y, bty = "n", type = "o", xlab = "time (day)", ylab = "flux")
lines(hbeta\$t, hbeta\$y, type = "o", col = "red")
grid()
``````

The data show measurements of the optical continuum and H-beta line flux for the active galaxy NGC 5548, obtained by AGN Watch.

We can compute the CCF, using the ICCF method, as follows:

``````  result <- sour::cross_correlate(cont, hbeta, dtau = 1, max.lag = 550)
plot(result\$tau, result\$ccf, type = "l", bty = "n", xlab = "time delay", ylab = "CCF")
grid()
``````

The CCF peak at around -20 days indicates the Hbeta lags behind the continuum.

We can choose to use simulations to assess the centroid of the CCF near its peak.

``````  result <- sour::cross_correlate(cont, hbeta, method = "iccf", local.est = TRUE, dtau = 1, nsim = 2000, max.lag = 120)
hist(result\$cent.dist, breaks = 50, col = "steelblue1", main = "", border = NA, prob = TRUE, xlab = "centroid delay (day)")
``````

Here, the time is in units of days. We choose the width for the lag bins (dtau = 1.0), the maximum lag to examine (max.lag = 120.0), and the number of simulations to use (nsim = 2000) when estimating the centroid distribution. Setting local.est = TRUE means that the mean and variances are computed using only pairs of data contributing to a given lag bin.

We can use these simulations to place intervals on the lag. Here we show the quartiles for the CCF centroid.

``````   quantile(result\$cent.dist)
0%       25%       50%       75%      100%
-32.73585 -24.75513 -23.28236 -21.79922 -13.49574
``````

## To do:

• Bring interpolation outside of main loop (in iccf_core) for speed
• understand the DCF: why max|CCF| > 1 occurs and ACF(0) != 1 sometimes.
• unit tests: 1. ACF of H-beta should peak at r=1; 2. ACF for random white noise should peak with r=1 at tau=0; shouldbe symmetric about tau = 0.