The detection of discontinuities in a regression curve or surface.

Share:

Description

This function uses a comparison of left and right handed nonparametric regression curves to assess the evidence for the presence of one or more discontinuities in a regression curve or surface. A hypothesis test is carried out, under the assumption that the errors in the data are approximately normally distributed. A graphical indication of the locations where the evidence for a discontinuity is strongest is also available.

Usage

1
sm.discontinuity(x, y, h, hd, ...)

Arguments

x

a vector or two-column matrix of covariate values.

y

a vector of responses observed at the covariate locations.

h

a smoothing parameter to be used in the construction of the nonparametric regression estimates. A normal kernel function is used and h is its standard deviation(s). However, if this argument is omitted h will be selected by an approximate degrees of freedom criterion, controlled by the df parameter. See sm.options for details.

hd

a smoothing parameter to be used in smoothing the differences of the left and right sided nonparametric regression estimates. A normal kernel function is used and hd is its standard deviation(s). However, if this argument is omitted hd will be set to h * sqrt(0.25), and h reset to h * sqrt(0.75), when x is a vector When x is a matrix, hd will be set to h * sqrt(0.5) and h will be reset to the same value.

...

other optional parameters are passed to the sm.options function, through a mechanism which limits their effect only to this call of the function; those relevant for this function are add, eval.points, ngrid, se, band, xlab, ylab, xlim, ylim, lty, col; see the documentation of sm.options for their description.

Details

The reference below describes the statistical methods used in the function. There are minor differences in somecomputational details of the implementation.

Currently duplicated rows of x cause a difficulty in the two covariate case. Duplicated rows should be removed.

Value

a list containing the following items

p

the p-value for the test of the null hypothesis that no discontinuities are present.

sigma

the estimated standard deviation of the errors.

eval.points

the evaluation points of the nonparametric regression estimates. When x is a matrix, eval.points is also a matrix whose columns define the evaluation grid of each margin of the evaluation rectangle.

st.diff

a vector or matrix of standardised differences between the left and right sided estimators at the evaluation points.

diffmat

when x is a vector, this contains the locations and standardised differences where the latter are greater than 2.5.

angle

when x is a matrix, this contains the estimated angles at which the standardised differences were constructed.

h

the principal smoothing parameter.

hd

the smoothing parameter used for double-smoothing (see the reference below).

Side Effects

a plot on the current graphical device is produced, unless the option display="none" is set.

References

Bowman, A.W., Pope, A. and Ismail, B. (2006). Detecting discontinuities in nonparametric regression curves and surfaces. Statistics \& Computing, 16, 377–390.

See Also

sm.regression, sm.options

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
par(mfrow = c(3, 2))

with(nile, {
   sm.discontinuity(Year, Volume, hd = 0)
   sm.discontinuity(Year, Volume)

   ind <- (Year > 1898)
   plot(Year, Volume)
   h <- h.select(Year, Volume)
   sm.regression(Year[!ind], Volume[!ind], h, add = TRUE)
   sm.regression(Year[ ind], Volume[ ind], h, add = TRUE)

   hvec <- 1:15
   p <- numeric(0)
   for (h in hvec) {
      result <- sm.discontinuity(Year, Volume, h,
                          display = "none", verbose = 0)
      p <- c(p, result$p)
   }
   plot(hvec, p, type = "l", ylim = c(0, max(p)), xlab = "h")
   lines(range(hvec), c(0.05, 0.05), lty = 2)
})

with(trawl, {
   Position  <- cbind(Longitude, Latitude)
   ind <- (Longitude < 143.8)
   # Remove a repeated point which causes difficulty with sm.discontinuity
   ind[54] <- FALSE
   sm.regression(Position[ind,], Score1[ind], theta = 35, phi = 30)
   sm.discontinuity(Position[ind,], Score1[ind], col = "blue")
})
par(mfrow = c(1, 1))
	
#  The following example takes longer to run.
#  Alternative values for nside are 32 and 64.
#  Alternative values of yjump are 1 and 0.5.
# nside  <- 16
# yjump  <- 2
# x1     <- seq(0, 1, length = nside)
# x2     <- seq(0, 1, length = nside)
# x      <- expand.grid(x1, x2)
# x      <- cbind(x1 = x[, 1], x2 = x[, 2])
# y      <- rnorm(nside * nside)
# ind    <- (sqrt((x[, 1] - 0.5)^2 + (x[, 2] - 0.5)^2) <= 0.25)
# y[ind] <- y[ind] + yjump
# image(x1, x2, matrix(y, ncol = nside))
# sm.discontinuity(x, y, df = 20, add = TRUE)

Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.