# lga: Perform LGA/RLGA In lga: Tools for linear grouping analysis (LGA)

## Description

Linear Grouping Analysis

## Usage

 ```1 2 3 4 5 6``` ```## Default S3 method: lga(x, k, biter = NULL, niter = 10, showall = FALSE, scale = TRUE, nnode=NULL, silent=FALSE, ...) ## Default S3 method: rlga(x, k, alpha=0.9, biter = NULL, niter = 10, showall = FALSE, scale = TRUE, nnode=NULL, silent=FALSE, ...) ```

## Arguments

 `x` a numeric matrix. `k` an integer for the number of clusters. `alpha` a numeric value between 0.5 and 1. For the robust estimate of LGA, specifying the percentage of points in the best subset. `biter` an integer for the number of different starting hyperplanes to try. `niter` an integer for the number of iterations to attempt for convergence. `showall` logical. If TRUE then display all the outcomes, not just the best one. `scale` logical. Allows you to scale the data, dividing each column by its standard deviation, before fitting. `nnode` an integer of many CPUS to use for parallel processing. Defaults to NULL i.e. no parallel processing. `silent` logical. If TRUE, produces no text output during processing. `...` For any other arguments passed from the generic function.

## Details

This code tries to find k clusters using the lga algorithm described in Van Aelst et al (2006). For each attempt, it has up to `niter` steps to get to convergence, and it does this from `biter` different starting hyperplanes. It then selects the clustering with the smallest Residual Orthoganal Sum of Squareds.

If `biter` is left as NULL, then it is selected via the equation given in Van Aeslt et al (2006).

The function `rlga` is the robust equivalent to LGA, and is introduced in Garcia-Escudero et al (2008).

Both functions are parallel computing aware via the `nnode` argument, and works with the package `snow`. In order to use parallel computing, one of MPI (e.g. lamboot) or PVM is necessary. For further details, see the documentation for `snow`.

Associated with the lga and rlga functions are a print method and a plot method (see the examples). In the plot method, the fitted hyperplanes are also shown as dashed-lines when there are only two dimensions.

## Value

An object of class ‘“lga”’. The list contains

 `cluster` a vector containing the cluster memberships. `ROSS` the Residual Orthogonal Sum of Squares for the solution. `converged` a logical. True if at least one solution has converged. `nconverg` the number of converged solutions (out of biter starts). `x` the (scaled if selected) dataset.

and the attributes include

 `scaled` logical. Is the data scaled? `k` the number of clusters to be found. `biter` the biter setting used. `niter` the niter setting used.

## Author(s)

Justin Harrington harringt@stat.ubc.ca

## References

Van Aelst, S. and Wang, X. and Zamar, R. and Zhu, R. (2006) ‘Linear Grouping Using Orthogonal Regression’, Computational Statistics \& Data Analysis 50, 1287–1312.

Garcia-Escudero, L.A., Gordaliza, A., San Martin, R., Van Aelst, S. and Zamar, R.H. (2008) ‘Robust linear clustering’. To appear in Journal of the Royal Statistical Society, Series B (accepted June, 2008).

`gap`
 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51``` ```## Synthetic Data ## Make a dataset with 2 clusters in 2 dimensions library(MASS) set.seed(1234) X <- rbind(mvrnorm(n=100, mu=c(1,-1), Sigma=diag(0.1,2)+0.9), mvrnorm(n=100, mu=c(1,1), Sigma=diag(0.1,2)+0.9)) lgaout <- lga(X,2) plot(lgaout) print(lgaout) ## Robust equivalent rlgaout <- rlga(X,2, alpha=0.75) plot(rlgaout) print(rlgaout) ## nhl94 data set data(nhl94) plot(lga(nhl94, k=3, niter=30)) ## Allometry data set data(brain) plot(lga(log(brain, base=10), k=3)) ## Second Allometry data set data(ob) plot(lga(log(ob[,2:3]), k=3), pch=as.character(ob[,1])) ## Corridor Walls data set ## To obtain the results reported in Garcia-Escudero et al. (2008): data(corridorWalls) rlgaout <- rlga(corridorWalls, k=3, biter = 100, niter = 30, alpha=0.85) pairs(corridorWalls, col=rlgaout\$cluster+1) plot(rlgaout) ## Parallel processing case ## In this example, running using 4 nodes. ## Not run: set.seed(1234) X <- rbind(mvrnorm(n=1e6, mu=c(1,-1), Sigma=diag(0.1,2)+0.9), mvrnorm(n=1e6, mu=c(1,1), Sigma=diag(0.1,2)+0.9)) abc <- lga(X, k=2, nnode=4) ## End(Not run) ```