# crossGeneral: Computations of Boundary Crossing Probabilities for the... In STAR: Spike Train Analysis with R

## Description

Computes the distribution of the first passage time through an arbitrary (crossGeneral) or a "tight" (crossTight) boundary for a Wiener process. The method of Loader and Deely (1987) is used. A tight boundary is a boundary generating the tighest confidence band for the process (Kendall et al, 2007). Utility function and methods: mkTightBMtargetFct, print, summary, plot, lines, are also provided to use and explore the results.

## Usage

  1 2 3 4 5 6 7 8 9 10 11 crossGeneral(tMax = 1, h = 0.001, cFct, cprimeFct, bFct, withBounds = FALSE, Lplus) crossTight(tMax = 1, h = 0.001, a = 0.3, b = 2.35, withBounds = TRUE, logScale = FALSE) mkTightBMtargetFct(ci = 0.95, tMax = 1, h = 0.001, logScale = FALSE) ## S3 method for class 'FirstPassageTime' print(x, ...) ## S3 method for class 'FirstPassageTime' summary(object, digits, ...) ## S3 method for class 'FirstPassageTime' plot(x, y, which = c("Distribution", "density"), xlab, ylab, ...) ## S3 method for class 'FirstPassageTime' lines(x, which = c("Distribution", "density"), ...) 

## Arguments

 tMax A positive numeric. The "time" during which the Wiener process is followed. h A positive numeric. The integration time step used for the numerical solution of the Volterra integral equation (see details). cFct A function defining the boundary to be crossed. The first argument of the function should be a "time" argument. If the first argument is a vector, the function should return a vector of the same length. cprimeFct A function defining time derivative of the boundary to be crossed. Needs to be specified only if a check of the sign of the kernel derivative (see details) is requested. The first argument of the function should be a "time" argument. If the first argument is a vector, the function should return a vector of the same length. bFct A function. The "b" function of Loader and Deely (1987). Does not need to be specified (i.e., can be missing) but can be used to improve convergence. The first argument of the function should be a "time" argument. If the first argument is a vector, the function should return a vector of the same length. withBounds A logical. Should bounds on the distribution be calculated? If yes, set it to TRUE, leave it to its default value, FALSE, otherwise. Lplus A logical. If bounds are requested (withBounds=TRUE) and if the sign of the time derivative of the kernel is known to be positive or null, set to TRUE, if it is known to be negative, set it to FALSE. If the sign is unknown, leave Lplus unspecified and provide a cprimeFct function. logScale A logical. Should intermediate calculations in crossTight be carried out on the log scale for numerical precision? If yes, set it to TRUE, leave it to its default, FALSE, otherwise. a,b numerics, the two parameters of the "tight" boundary: c(t) = a + b*sqrt(t). See details. ci A numeric larger than 0 and smaller than 1. The nominal coverage probability desired for a "tight" confidence band (see details). x,object A FirstPassageTime object returned by crossGeneral or crossTight. y Not used but required for a plot method. which A character string, "Distribution" or "density", specifying if a probability distribution or a probability density should be graphed. xlab,ylab See plot. digits A positive integer. The number of digits to print in summary. If bounds were computed, the value of digits is computed internally based on the bounds width. ... Used in plot and lines to pass further arguments (see plot and lines), not used in print and summary.

## Details

The Loader and Deely (1987) method to compute the probability G(t) that the first passage of a Wiener process / Brownian motion occurs between 0 and t (argument tMax of crossGeneral and crossTight) through a boundary defined by c(t) is based on the numerical solution of a Volterra integral equation of the first kind satisfied by G() and defined by their Eq. 2.2:

F(t) = \int_0^t K(t,u) dG(u)

where, F(t) is defined by:

F(t) = pnorm(-c(t)/sqrt(t)) + exp(-2*b(t)*(c(t)-t*b(t)))*pnorm((-c(t)+2*t*b(t))/sqrt(t))

K(t,u) is defined by:

K(t,u)=pnorm((c(u)-c(t))/sqrt(t-u)) + exp(-2*b(t)*(c(t)-c(u)-(t-u)*b(t)))*pnorm((c(u)-c(t)+2*(t-u)*b(t))/(sqrt(t-u)))

and b(t) is an additional function (that can be uniformly 0) that is chosen to improve convergence speed and to get error bounds. Argument h is the step size used in the numerical solution of the above Volterra integral equation. The mid-point method (Eq. 3.1 and 3.2 of Loader and Deely (1987)) is implemented. If tMax is not a multiple of h it is modified as follows: tMax <- round(tMax/h)*h.

crossGeneral generates functions F() and K(,) internally given c() (argument cFct) and b() (argument bFct). If bFct is not given (i.e., missing(bFct) returns TRUE) it is taken as uniformly 0. If a numeric is given for cFct then c() is defined as a uniform function returning the first element of the argument (cFct).

Function crossTight assumes the following functional form for c(): a + b * sqrt(t). b() is set to c'() (the derivative of c()). Arguments a and b of crossTight correspond to the 2 parameters of c().

If argument withBounds is set to TRUE then bounds on G() are computed. Function crossTight uses Eq. 3.6 and 3.7 of Loader and Deely (1987) to compute these bounds, Gl(t) and Gu(t). Function crossGeneral uses Eq. 3.6 and 3.7 (if argument Lplus is set to TRUE) or Eq. 3.10 and 3.11 (if argument Lplus is set to FALSE). Here Lplus stands for the sign of the partial derivative of the kernel K(,) with respect to its second argument. If the sign is not known the user can provide the derivative c'() of c() as argument cprimeFct. A (slow) numerical check is then performed to decide wether Lplus should be TRUE or FALSE or if it changes sign (in which case bounds cannot be obtained and an error is returned).

In function crossTight argument logScale controls the way some intermediate computations of the mid-point method are implemented. If set to FALSE (the default) a literal implementation of Eq. 3.2 of Loader and Deely (1987) is used. If set to TRUE then additions subtractions are computed on the log scale using functions logspace_add and logspace_sub of the R API. The computation is then slightly slower and it turns out that the gain in numerical precision is not really significant, so you can safely leave this argument to its default value.

## Value

crossGeneral and crossTight return a FirstPassageTime object which is a list with the following components:

 time  A numeric vector of "times" at which the first passage time probability has been evaluated. G  A numeric vector of first passage probability. Gu  A numeric vector with the upper bound of first passage probability. Only if withBounds was set to TRUE. Gl  A numeric vector with the lower bound of first passage probability. Only if withBounds was set to TRUE. mids  A numeric vector of "times" at which the first passage time probability density has been evaluated. Mid points of component time. g  A numeric vector of first passage probability density. h  A numeric. The value of argument h of crossGeneral or crossTight. call  The matched call.

mkTightBMtargetFct returns a function which can be used in optim. This function returns the square of the difference between (1-ci)/2 (remember the "symmetry" of the Wiener processes paths, that is, for every path there is a symmetric one with respect to the abscissa having with the same probability) and the probability to have the first passage time of the Wiener process smaller or equal to 1 when the boundary is the "tight" boundary defined by: a + b*sqrt(t). The function takes a single vector argument containing the log of the parameters a (vector's first element) and b (vector's second element).

Methods print.FirstPassageTime and summary.FirstPassageTime output the probability to observe the first exit between 0 and tMax. If bounds were computed, the precision on the probability is also returned (as an attribute for print.FirstPassageTime). summary.FirstPassageTime also gives the integration time step, h, used.

## Warning

crossGeneral with withBounds = TRUE and a negative kernel derivative is presently poorly tested, so be careful and let me know if mistakes show up.

## Note

Using logScale = TRUE in crossTight seems to be an overkill, that is, it doubles computation's time without bringing significant numerical improvement.

crossGeneral is for now pure R code. The first passage probability is obtained by solving the lower triangular system (Eq. 3.1 of Loader and Deely (1987)) with forwardsolve and is therefore rather fast (but can be memory "hungry"). The bounds are computed by 2 nested loops and can therefore be long to compute.

crossTight is calling a C code and is fast.

Loader and Deely paper also describes a method where G(t) is solution of a Volterra integral equation of the second kind (their Eq. 2.7). This approach is presently not implemented in the above functions.

## Author(s)

Christophe Pouzat [email protected]

## References

C. R. Loader and J. J. Deely (1987) Computations of Boundary Crossing Probabilities for the Wiener Process. J. Statist. Comput. Simul. 27: 95–105.

W. S. Kendall, J.- M. Marin and C. P. Robert (2007) Brownian Confidence Bands on Monte Carlo Output. Statistics and Computing 17: 1–10. Preprint available at: http://www.ceremade.dauphine.fr/%7Exian/kmr04.rev.pdf

print, summary, plot, lines, pinvgauss
  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 ## Not run: ## Reproduce Table 1 (p 101) of Loader and Deely (1987) ## define a vector of n values nLD <- c(8,16,32,64,128) ## Part 1: c(t) = sqrt(1+t) and tMax=1 ## define cFct cFT1p1 <- function(t) sqrt(1+t) ## define the different bFct bFT1p1.ii <- function(t) 0.5/sqrt(1+t) bFT1p1.iii <- function(t) (cFT1p1(t)-cFT1p1(0))/t bFT1p1.iv <- function(t) 0.5*(bFT1p1.ii(t)+bFT1p1.iii(t)) bFT1p1.v <- function(t) (2*t-4/5*((1+t)^2.5-1))/t^3+3*cFT1p1(t)/2/t ## Do the calculations round(t(sapply(nLD, function(n) { c(n=n, i=crossGeneral(tMax=1,h=1/n,cFct=cFT1p1)$G[n], ii=crossGeneral(tMax=1,h=1/n,cFct=cFT1p1,bFct=bFT1p1.ii)$G[n], iii=crossGeneral(tMax=1,h=1/n,cFct=cFT1p1,bFct=bFT1p1.iii)$G[n], iv=crossGeneral(tMax=1,h=1/n,cFct=cFT1p1,bFct=bFT1p1.iv)$G[n], v=crossGeneral(tMax=1,h=1/n,cFct=cFT1p1,bFct=bFT1p1.v)$G[n])})), digits=6) ## Part 2: c(t) = exp(-t) and tMax=1 ## define cFct cFT1p2 <- function(t) exp(-t) ## define the different bFct cFT1p2 <- function(t) exp(-t) bFT1p2.ii <- function(t) -exp(-t) bFT1p2.iii <- function(t) (cFT1p2(t)-cFT1p2(0))/t bFT1p2.iv <- function(t) 0.5*(bFT1p2.ii(t)+bFT1p2.iii(t)) bFT1p2.v <- function(t) 3*(1-t-exp(-t))/t^3+3*cFT1p2(t)/2/t ## Do the calculations round(t(sapply(nLD, function(n) { c(n=n, i=crossGeneral(tMax=1,h=1/n,cFct=cFT1p2)$G[n], ii=crossGeneral(tMax=1,h=1/n,cFct=cFT1p2,bFct=bFT1p2.ii)$G[n], iii=crossGeneral(tMax=1,h=1/n,cFct=cFT1p2,bFct=bFT1p2.iii)$G[n], iv=crossGeneral(tMax=1,h=1/n,cFct=cFT1p2,bFct=bFT1p2.iv)$G[n], v=crossGeneral(tMax=1,h=1/n,cFct=cFT1p2,bFct=bFT1p2.v)$G[n])})), digits=6) ## Part 3: c(t) = t^2 + 3*t + 1 and tMax=1 ## define cFct cFT1p3 <- function(t) t^2+3*t+1 ## define the different bFct bFT1p3.ii <- function(t) 2*t+3 bFT1p3.iii <- function(t) (cFT1p3(t)-cFT1p3(0))/t bFT1p3.v <- function(t) 5*t/4+3 bFT1p3.vi <- function(t) rep(3,length(t)) round(t(sapply(nLD, function(n) { c(n=n, i=crossGeneral(tMax=1,h=1/n,cFct=cFT1p3)$G[n], ii=crossGeneral(tMax=1,h=1/n,cFct=cFT1p3,bFct=bFT1p3.ii)$G[n], iii=crossGeneral(tMax=1,h=1/n,cFct=cFT1p3,bFct=bFT1p3.iii)$G[n], v=crossGeneral(tMax=1,h=1/n,cFct=cFT1p3,bFct=bFT1p3.v)$G[n], vi=crossGeneral(tMax=1,h=1/n,cFct=cFT1p3,bFct=bFT1p3.vi)$G[n])})), digits=6) ## Part 3: c(t) = t^2 + 3*t + 1 and tMax=1 ## define cFct cFT1p4 <- function(t) 1/t ## Here only column (i) and (vii) are reproduced. ## If one attempts to reproduce (ii) directly with crossGeneral ## a NaN appears (when a -Inf is the correct value) in functions ## F() and K(,) (see details) for instance when t=0 in F. ## Then as crossGeneral is presently written R attempts to ## compute t*b(t), where b(t) is c'(t), that is, t*(-1/t^2) which is ## NaN (for R) when t=0. bFT1p4.vii <- function(t) rep(-1,length(t)) round(t(sapply(nLD, function(n) { c(n=n, i=crossGeneral(tMax=1,h=1/n,cFct=cFT1p4)$G[n], vii=crossGeneral(tMax=1,h=1/n,cFct=cFT1p4,bFct=bFT1p4.vii)$G[n])})), digits=6) ## The last 3 rows of column (vii) are not the same as in the paper ## Reproduce Table 4 (p 104) of Loader and Deely (1987) ## As before the probability of first passage between ## 0 and 1 is computed. This time only three boundary ## functions are considered. The error bounds are ## obtained ## Part 1: c(t) = sqrt(1+t) ## Left columns pair: b(t) = c'(t) round(t(sapply(nLD, function(n) { res <- crossGeneral(tMax=1,h=1/n,cFct=cFT1p1,bFct=bFT1p1.ii,withBounds=TRUE,Lplus=TRUE) c(Gl=res$Gl[n],Gu=res$Gu[n]) } ) ), digits=5) ## Right columns pair: b(t) = 0 round(t(sapply(nLD, function(n) { res <- crossGeneral(tMax=1,h=1/n,cFct=cFT1p1,withBounds=TRUE,Lplus=TRUE) c(n=n,Gl=res$Gl[n],Gu=res$Gu[n]) } ) ), digits=5) ## Part 2: c(t) = t^2 + 3*t + 1 ## Left columns pair: b(t) = 3 - 2*t round(t(sapply(nLD, function(n) { res <- crossGeneral(tMax=1,h=1/n,cFct=cFT1p3,bFct=function(t) 3-2*t,withBounds=TRUE,Lplus=TRUE) c(n=n,Gl=res$Gl[n],Gu=res$Gu[n]) } ) ), digits=5) ## Right columns pair: b(t) = 3 - t round(t(sapply(nLD, function(n) { res <- crossGeneral(tMax=1,h=1/n,cFct=cFT1p3,bFct=function(t) 3-2*t,withBounds=TRUE,Lplus=TRUE) c(n=n,Gl=res$Gl[n],Gu=res$Gu[n]) } ) ), digits=5) ## Part 3: c(t) = 1 + sin(t) ## Left columns pair: b(t) = c'(t) round(t(sapply(nLD, function(n) { res <- crossGeneral(tMax=1,h=1/n,cFct=function(t) 1+sin(t),bFct=function(t) cos(t),withBounds=TRUE,Lplus=TRUE) c(n=n,Gl=res$Gl[n],Gu=res$Gu[n]) } ) ), digits=5) ## Left columns pair: b(t) = 0.5 round(t(sapply(nLD, function(n) { res <- crossGeneral(tMax=1,h=1/n,cFct=function(t) 1+sin(t),bFct=function(t) rep(0.5,length(t)),withBounds=TRUE,Lplus=TRUE) c(n=n,Gl=res$Gl[n],Gu=res$Gu[n]) } ) ), digits=5) ## Check crossGeneral against an analytical inverse Gaussian ## distribution ## Define inverse Gaussian parameters mu.true <- 0.075 sigma2.true <- 3 ## Define a function transforming the "drift" (mu.true) and ## "noise variance" (sigma2.true) of the default inverse ## Gaussian parametrization of STAR into a ## linear boundary of an equivalent Wiener process first ## passage time problem star2ld <- function(mu,sigma2) c(sqrt(1/sigma2),-sqrt(1/sigma2)/mu) ## Get the "equivalent" boundary parameters (y intercept and slope) parB1 <- star2ld(mu.true,sigma2.true) ## Plot the "target" inverse Gaussian density xx <- seq(0.001,0.3,0.001) plot(xx,dinvgauss(xx,mu=mu.true,sigma2=sigma2.true),type="l") ## Get the numerical estimate of the density using Loader and ## Deely Volterra integral equation method igB1 <- crossGeneral(tMax=0.3,h=0.001,cFct=function(t) parB1[1]+parB1[2]*t,withBounds=FALSE) ## superpose the numerical estimate to the exact solution ## use lines method to do that lines(igB1,"density",col=2) ## Use of crossTight and associated function ## Get the paramters, a and b, of the "approximate" ## tightest boundary: c(t) = a + b*sqrt(t), giving a ## 0.05 probability of exit between 0 and 1 ## (in fact we are discussing here a pair of symmetrical ## bounaries, c(t) and -c(t)). See Kendall et al (2007) ## for details ## Start by defining the target function target95 <- mkTightBMtargetFct(ci=0.95) ## get the optimal log(a) and log(b) using ## the values of table 1 of Kendall et al as initial ## guesses p95 <- optim(log(c(0.3,2.35)),target95,method="BFGS") ## check the convergence of BFGS p95$convergence ## check if the parameters changed a lot exp(p95$par) ## Get the bounds on G(1) for these optimal parameters d95 <- crossTight(a=exp(p95$par[1]),b=exp(p95$par[2]),withBound=TRUE,logScale=FALSE) ## print out the summary summary(d95) ## Do the same for the 0.01 probability of first passage target99 <- mkTightBMtargetFct(ci=0.99) p99 <- optim(p95$par,target99,method="BFGS") p99$convergence exp(p99$par) d99 <- crossTight(a=exp(p99$par[1]),b=exp(p99$par[2]),withBound=TRUE,logScale=FALSE) summary(d99) ## End(Not run)