oregMclust: Orthogonal Regression Clustering

Description Usage Arguments Details Value Author(s) References See Also Examples

Description

Computation of center points for regression data by means of orthogonal regression. A cluster method based on redescending M-estimators is used.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
  oregMclust(datax, datay, bw, method = "const",
    xrange = range(datax), yrange = range(datay),
    prec = 4, na = 1, sa = NULL, nl = 10, nc = NULL,
    brmaxit = 1000)

  regparm(reg)

  ## S3 method for class 'oregMclust'
plot(x, datax, datay, prec = 3, rcol = "black",
  rlty = 1, rlwd = 3, ...)

  ## S3 method for class 'oregMclust'
print(x, ...)

Arguments

datax, datay

numerical vectors of coordinates of the observations. Alternatively, a matrix with two or three columns can be given. Then, the first two columns are interpreted as coordinates of the observations and, if available, the third is passed to parameter sa.

bw

positive number. Bandwidth for the cluster method.

method

optional string. Method of choosing starting values for maximization. Possible values are:

  • "const": a constant number of angles for every observation is used. By default, one horizontal line through any observation is used as starting value. If a value for parameter na is passed, na lines through any observation are used. Alternatively, with the parameter sa a proper starting angle for every observation can be specified. In this case, na is ignored. The length of sa must be the number of observations.

  • "all": every line through any two observations is used.

  • "prob": Clusters are searched iteratively with randomly chosen starting values until either no new clusters are found (default), or until nc clusters are found. The precision of distinguishing the clusters can be tuned with the parameter prec. In each iteration, nl times a line through two randomly chosen observations is used as starting value.

xrange, yrange

optional numerical intervals describing the domains of the observations. This is only used for normalization of the data. Note that both intervals should have approximately the same length or should be transformed otherwise. This is not done automatically, since this transformation affects the choice of the bandwidth.

prec

optional positive integer. Tuning parameter for distinguishing different clusters, which is passed to deldupMclust.

na

optional positive integer. Number of angles per observation used as starting values for method = "const" (default).

sa

optional numerical vector. Angles (within [0,2pi)) used as starting values for method = "const" (default).

nl

optional positive integer. Number of starting lines in each iteration for method = "prob".

nc

optional positive integer. Number of clusters to search if method "const" is chosen. Note that if nc is too large, i.e., nc clusters cannot be found, the function does not terminate. Attention! Using Windows, it is impossible to interrupt the routine manually in this case!

brmaxit

optional positive integer. Since the maximization could be very slow in some cases depending on the starting value, the maximization is stopped after brmaxit iterations.

reg, x

object returned from oregMclust.

rcol, rlty, rlwd

optional graphic parameters used for plotting regression lines.

...

additional parameters passed to plot.

Details

oregMclust implements a cluster method based on redescending M-estimators for the case of orthogonal regression. This method is introduced by Mueller and Garlipp in 2003 (see references).

regparm transforms the columns "alpha" and "beta" to "intersept" and "slope".

See also bestMclust, projMclust, and envMclust for choosing the 'best' clusters out of all found clusters.

Value

A numerical matrix containing one row for every found regression center line. The columns "alpha" and "beta" are their parameters in the representation (cos(alpha), sin(alpha)) * (x,y)' = beta, where alpha is within [0,2pi). For the alternative representation y = mx + b, the return value can be passed to regparm.

The columns "value" and "count" give the value of the objective function and the number how often they are found.

Author(s)

Tim Garlipp, TimGarlipp@gmx.de

References

Mueller, C. H., & Garlipp, T. (2005). Simple consistent cluster methods based on redescending M-estimators with an application to edge identification in images. Journal of Multivariate Analysis, 92(2), 359–385.

See Also

bestMclust, projMclust, envMclust, deldupMclust

Examples

1
2
3
4
5
6
7
8
9
  x = c(rnorm(100, 0, 3), rnorm(100, 5, 3))
  y = c(-2 * x[1:100] - 5, 0.5 * x[101:200] + 30)/2
  x = x + rnorm(200, 0, 0.5)
  y = y + rnorm(200, 0, 0.5)

  reg = oregMclust(x, y, 1, method = "prob")
  reg = projMclust(reg, x, y)
  reg
  plot(bestMclust(reg, 2, crit = "proj"), x, y)

Example output

Break with <CTRL>-C (linux) or <ESC> (windows)
Found clusters:  3 
Found clusters:  5 
Found clusters:  5 
      alpha    beta    value count proj
[1,] 2.9083 -0.2363 23.92903    17   20
[2,] 0.7832 -1.7508 35.41184     6   35
[3,] 2.6643  3.6276 22.21987     4   14
[4,] 1.8193 14.5220 35.19585     2   85
[5,] 0.7831 -1.7508 35.41184     1   46

edci documentation built on May 1, 2019, 7:44 p.m.

Related to oregMclust in edci...