# summary.CountingProcessSamplePath: Create and Explore Counting Process Sample Path Summaries In STAR: Spike Train Analysis with R

## Description

These functions / methods are designed to test a `CountingProcessSamplePath` object against a uniform Poisson process with rate 1.

## Usage

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18``` ```## S3 method for class 'CountingProcessSamplePath' summary(object, exact = TRUE, lag.max = NULL, d = max(c(2, sqrt(length(object\$ppspFct()))%/%5)), ...) ## S3 method for class 'CountingProcessSamplePath.summary' print(x, digits = 5, ...) ## S3 method for class 'CountingProcessSamplePath.summary' plot(x, y, which = c(1,2,6,8), main, caption = c(expression(paste("Uniform on ", Lambda," Test")), "Berman's Test", "Log Survivor Function", expression(paste(U[k+1]," vs ", U[k])), "Variance vs Mean Test", "Wiener Process Test", "Autocorrelation Fct.", "Renewal Test"), ask = FALSE, lag.max = NULL, d = max(c(2, sqrt(length(eval(x\$call[[2]])\$ppspFct()))%/%5)), ...) ```

## Arguments

 `object` A `CountingProcessSamplePath` object. `exact` Should an exact Kolmogorov test be used? See `ks.test`. `lag.max` See `renewalTestPlot`. `d` See `renewalTestPlot`. `x` A `CountingProcessSamplePath.summary` object. `digits` An `integer`, the number of digits to be used while printing summaries. See `round`. `y` Not used but required for compatibility with the `plot` method. `which` If a subset of the test plots is required, specify a subset of the numbers `1:6`. `main` Title to appear above the plots, if missing the corresponding element of `caption` will be used. `caption` Default caption to appear above the plots or, if `main` is given, bellow it `ask` A `logical`; if `TRUE`, the user is asked to hit the return key before each plot generation, see `par(ask=.)`. `...` Passed to `chisq.test` used internally by `summary`, not used in `plot` and `print`.

## Details

If the `CountingProcessSamplePath` object `x` is a the realization of a homogeneous Poisson process then, conditioned on the number of events observed, the location of the events (jumps in N(t)) is uniform on the period of observation. This is a basic property of the homogeneous Poisson process derived in Chap. 2 of Cox and Lewis (1966) and Daley and Vere-Jones (2003). Component `UniformGivenN` of a `CountingProcessSamplePath.summary` list contains the p.value of the Kolmogorov test of this uniform null hypothesis. The first graph generated by the `plot` method displays the Kolgorov test graphically (i.e., the empirical cumulative distribution function and the null hyptohesis (the diagonal). The two dotted lines on both sides of the diagonal correspond to 95 and 99% (asymptotic) confidence intervals. This is the graph shown on Fig. 9 (p 19) of Ogata (1988). Notice that the `summary` method allows you to compute the exact p.value.

If we write x[i] the jump locations of the `CountingProcessSamplePath` object `x` and if the latter is the realization of a homogeneous Poisson process then the intervals:

y[i]=x[i+1]-x[i]

are realizations of iid rvs from an exponential distribution with rate 1 and the:

u[i]=1 - exp(-y[i])

are realizations of iid rvs from a uniform distribution on [0,1). The second graph generated by the `plot` method tests this uniform distribution hypotheses with a Kolmogorov Test. This is the graph shown on Fig. 10 (p 19) of Ogata (1988) which was suggested by Berman. This is also the one of the graphs proposed by Brown et al (2002) (the other one is a Q-Q plot for the same quantities). The two dotted lines on both sides of the diagonal correspond to 95 and 99% (asymptotic) confidence intervals. Component `BermanTest` of a `CountingProcessSamplePath.summary` list contains the p.value of the Kolmogorov test of this uniform null hypothesis.

Following the line of the previous paragraph, if the distribution of the y[i] is an exponential distribution with rate 1, then their survivor function is: exp(-y). This is what's shown on the third graph generated by the `plot` method, using a log scale for the ordinate. The point wise CI at 95 and 99% are also drawn (dotted lines). This is the graph shown on Fig. 12 (p 20) of Ogata (1988)

If the u[i] of the second paragraph are realizations of iid uniform rvs on [0,1) then a plot of u[i+1] vs u[i] should fill uniformly the unit square [0,1) x [0,1). This is the fourth generated graph (the one shown on Fig. 11 (p 20) of Ogata (1988)) by the `plot` method while the seventh graph shows the autocorrelation function of the u[i]s. Component `RenewalTest` of a `CountingProcessSamplePath.summary` list contains a slightly more elaborated (and quantitative) version of this test summarizing the fourth graph (bottom right) generated by a call to `renewalTestPlot`. This list element is itself a list with elements: `chi2.95` (a `logical`), `chi2.99` (a `logical`) and `total` (an `integer`). The bounds resulting from repetitively testing a sequence of what are, under the null hypothesis, iid chi2 rvs are obtained using Donsker's Theorem (see bellow). For each lag the number of degrees of freedom of the chi2 distribution is subtracted from each chi2 value. These centered values are then divided by their sd (assuming the null hypothesis is correct). The cumulative sum of the centered and scaled sequence is formed and is divided by the square root of the maximal lag used. This is "plugged-in" the Donsker's Theorem. The eighth graph of the `plot` method displays the resulting Wiener process. With the tight confidence regions of Kendall et al (2007), see bellow.

If the x[i] are realization of a homogeneous Poisson process observed between 0 and T, then the number of events observed on non-overlapping windows of length t should be iid Poisson rv with mean t (and variance t). The observation period is therefore chopped into non-overlapping windows of increasing length and the empirical variance of the event count is graphed versus the empirical mean, together with 95 and 99% CI (using a normal approximation). This is done by calling internally `varianceTime`. That's what's generated by the fifth graph of the `plot` method. This is the graph shown on Fig. 13 (p 20) of Ogata (1988). Component `varianceTimeSummary` of a `CountingProcessSamplePath.summary` list contains a summary of this test, counting the number of events out of each band.

The last graph generated by the `plot` method and the companions components, `Wiener95` and `Wiener99`, of a `CountingProcessSamplePath.summary` list represent "new" tests (as far as I know). They are based on the fact that if the y[i] above are realizations of iid rvs following an exponential distribution with rate 1, then the w[i]=y[i]-1 are realizations of iid rvs with mean 0 and variance 1. We can then form the partial sums:

S[n]=w[1]+...+w[n]

and define the random right continuous with a left-hand limit functions on [0,1]:

S[floor(n*t)]/sqrt(n)

These functions are realizations of a process which converges (weakly) to a Wiener process on [0,1]. The proof of this statement is a corollary of Donsker's Theorem and can be found on pp 146-147, Theorem 14.1, of Billingsley (1999). I thank Vilmos Prokaj for pointing this reference to me.What is then done is testing if the putative Wiener process is entirely within the tight boundaries defined by Kendall et al (2007) for a true Wiener process, see `crossTight`.

## Value

`summary.CountingProcessSamplePath` returns a `CountingProcessSamplePath.summary` object which is a `list` with the following components:

 `UniformGivenN ` A `numeric`, the p.value of the Kolmogorov test of uniformity of the events times given the number of events. `Wiener95 ` A `logical`: is the scaled martingale within the tight 95% confidence band? `Wiener99 ` A `logical`: is the scaled martingale within the tight 99% confidence band? `BermanTest ` A `numeric`, the p.value of the Kolmogorov test of uniformity of the scaled inter events intervals. `RenewalTest ` A `list` with components: `chi2.95`, `chi2.99` and `total`. `chi2.95` resp. `chi2.99` is a `logical` and is `TRUE` if the Wiener process obtained as described above is within the "tight" 95% resp. 99% confidence band of Kendall et al (2007). `total` gives the total number of lags. See `renewalTestPlot`. `varianceTime ` A `varianceTime` object. `varianceTimeSummary ` A `numeric` vector with components: `total`, `out95` and `out99`. `total` gives the total number of window sizes explored. `out95` gives the number of windows within which the variance is out of the 95% confidence band. `out99` gives the number of windows within which the variance is out of the 99% confidence band. See `varianceTime`. `n ` An `integer`, the number of events. `call ` The matched call.

## Acknowledgments

I thank Vilmos Prokaj for pointing out Donsker's Theorem and for indicating me the proof's location (Patrick Billingsley's book).

I also thank Olivier Faugeras and Jonathan Touboul for pointing out Donsker's therom to me.

## Warning

If you wan these tests to be meaningful do not apply them to the data you just used to fit your conditional intensity model.

## Note

These functions / methods are designed to replace the `summary.transformedTrain` and `plot.transformedTrain` ones. The former have a more general design.

Of course to be fully usable, these functions must be coupled to functions allowing users to fit conditional intensity models.The support for that in `STAR` is not complete yet but is coming soon. See for now the example bellow.

The end of the example bellow (not ran by default) shows that the coverage probability of the Wiener Process confidence bands are really good even for small (50) sample sizes.

## Author(s)

Christophe Pouzat christophe.pouzat@gmail.com

## References

Patrick Billingsley (1999) Convergence of Probability Measures. Wiley - Interscience.

Brillinger, D. R. (1988) Maximum likelihood analysis of spike trains of interacting nerve cells. Biol. Cybern. 59: 189–200.

Brown, E. N., Barbieri, R., Ventura, V., Kass, R. E. and Frank, L. M. (2002) The time-rescaling theorem and its application to neural spike train data analysis. Neural Computation 14: 325-346.

D. R. Cox and P. A. W. Lewis (1966) The Statistical Analysis of Series of Events. John Wiley and Sons.

Daley, D. J. and Vere-Jones D. (2003) An Introduction to the Theory of Point Processes. Vol. 1. Springer.

Ogata, Yosihiko (1988) Statistical Models for Earthquake Occurrences and Residual Analysis for Point Processes. Journal of the American Statistical Association 83: 9-27.

Johnson, D.H. (1996) Point process models of single-neuron discharges. J. Computational Neuroscience 3: 275–299.

W. S. Kendall, J.- M. Marin and C. P. Robert (2007) Brownian Confidence Bands on Monte Carlo Output. Statistics and Computing 17: 1–10. Preprint available at: http://www.ceremade.dauphine.fr/%7Exian/kmr04.rev.pdf

`mkCPSP`, `as.CPSP`, `plot.CountingProcessSamplePath`, `print.CountingProcessSamplePath`, `varianceTime`, `crossTight`, `renewalTestPlot`, `ks.test`
 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264``` ```## Not run: ## load one spike train data set of STAR data(e060824spont) ## Create the CountingProcessSamplePath object n1spt.cp <- as.CPSP(e060824spont[["neuron 1"]]) ## print it n1spt.cp ## plot it plot(n1spt.cp) ## get the summary ## Notice the warning due to few identical interspike intervals ## leading to an inaccurate Berman's test. summary(n1spt.cp) ## Simulate data corresponding to a renewal process with ## an inverse Gaussian ISI distribution in the spontaneous ## regime modulated by a multiplicative stimulus whose time ## course is a shifted and scaled chi2 density. ## Define the "stimulus" function stimulus <- function(t, df=5, tonset=5, timeFactor=5, peakFactor=10) { dchisq((t-tonset)*timeFactor,df=df)*peakFactor } ## Define the conditional intensity / hazard function hFct <- function(t, tlast, df=5, tonset=5, timeFactor=5, peakFactor=10, mu=0.075, sigma2=3 ) { hinvgauss(t-tlast,mu=mu,sigma2=sigma2)*exp(stimulus(t,df,tonset,timeFactor,peakFactor)) } ## define the function simulating the train with the thinning method makeTrain <- function(tstop=10, peakCI=200, preTime=5, df=5, tonset=5, timeFactor=5, peakFactor=10, mu=0.075, sigma2=3) { result <- numeric(500) - preTime - .Machine\$double.eps result.n <- 500 result[1] <- 0 idx <- 1 currentTime <- result[1] while (currentTime < tstop+preTime) { currentTime <- currentTime+rexp(1,peakCI) p <- hFct(currentTime, result[idx], df=df, tonset=tonset+preTime, timeFactor=timeFactor, peakFactor=peakFactor, mu=mu, sigma2=sigma2)/peakCI rthreshold <- runif(1) if (p>1) stop("Wrong peakCI") while(p < rthreshold) { currentTime <- currentTime+rexp(1,peakCI) p <- hFct(currentTime, result[idx], df=df, tonset=tonset+preTime, timeFactor=timeFactor, peakFactor=peakFactor, mu=mu, sigma2=sigma2)/peakCI if (p>1) stop("Wrong peakCI") rthreshold <- runif(1) } idx <- idx+1 if (idx > result.n) { result <- c(result,numeric(500)) - preTime - .Machine\$double.eps result.n <- result.n + 500 } result[idx] <- currentTime } result[preTime < result & result <= tstop+preTime] - preTime } ## set the seed set.seed(20061001) ## "make" the train t1 <- makeTrain() ## create the corresponding CountingProcessSamplePath ## object cpsp1 <- mkCPSP(t1) ## print it cpsp1 ## test it cpsp1.summary <- summary(cpsp1) cpsp1.summary plot(cpsp1.summary) ## Define a function returning the conditional intensity function (cif) ciFct <- function(t, tlast, df=5, tonset=5, timeFactor=5, peakFactor=10, mu=0.075, sigma2=3 ) { sapply(t, function(x) { if (x <= tlast[1]) return(1/mu) y <- x-max(tlast[tlast