lga: Perform LGA/RLGA
In lga: Tools for linear grouping analysis (LGA)

Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/lga.R

Linear Grouping Analysis

## Default S3 method:
lga(x, k, biter = NULL, niter = 10, showall = FALSE, scale = TRUE,
    nnode=NULL, silent=FALSE, ...)
## Default S3 method:
rlga(x, k, alpha=0.9, biter = NULL, niter = 10, showall = FALSE, scale = TRUE,
    nnode=NULL, silent=FALSE, ...)

`x`	a numeric matrix.
`k`	an integer for the number of clusters.
`alpha`	a numeric value between 0.5 and 1. For the robust estimate of LGA, specifying the percentage of points in the best subset.
`biter`	an integer for the number of different starting hyperplanes to try.
`niter`	an integer for the number of iterations to attempt for convergence.
`showall`	logical. If TRUE then display all the outcomes, not just the best one.
`scale`	logical. Allows you to scale the data, dividing each column by its standard deviation, before fitting.
`nnode`	an integer of many CPUS to use for parallel processing. Defaults to NULL i.e. no parallel processing.
`silent`	logical. If TRUE, produces no text output during processing.
`...`	For any other arguments passed from the generic function.

This code tries to find k clusters using the lga algorithm described in Van Aelst et al (2006). For each attempt, it has up to niter steps to get to convergence, and it does this from biter different starting hyperplanes. It then selects the clustering with the smallest Residual Orthoganal Sum of Squareds.

If biter is left as NULL, then it is selected via the equation given in Van Aeslt et al (2006).

The function rlga is the robust equivalent to LGA, and is introduced in Garcia-Escudero et al (2008).

Both functions are parallel computing aware via the nnode argument, and works with the package snow. In order to use parallel computing, one of MPI (e.g. lamboot) or PVM is necessary. For further details, see the documentation for snow.

Associated with the lga and rlga functions are a print method and a plot method (see the examples). In the plot method, the fitted hyperplanes are also shown as dashed-lines when there are only two dimensions.

An object of class ‘“lga”’. The list contains

`cluster`	a vector containing the cluster memberships.
`ROSS`	the Residual Orthogonal Sum of Squares for the solution.
`converged`	a logical. True if at least one solution has converged.
`nconverg`	the number of converged solutions (out of biter starts).
`x`	the (scaled if selected) dataset.

and the attributes include

`scaled`	logical. Is the data scaled?
`k`	the number of clusters to be found.
`biter`	the biter setting used.
`niter`	the niter setting used.

Justin Harrington harringt@stat.ubc.ca

Van Aelst, S. and Wang, X. and Zamar, R. and Zhu, R. (2006) ‘Linear Grouping Using Orthogonal Regression’, Computational Statistics \& Data Analysis 50, 1287–1312.

Garcia-Escudero, L.A., Gordaliza, A., San Martin, R., Van Aelst, S. and Zamar, R.H. (2008) ‘Robust linear clustering’. To appear in Journal of the Royal Statistical Society, Series B (accepted June, 2008).

gap

## Synthetic Data
## Make a dataset with 2 clusters in 2 dimensions

library(MASS)
set.seed(1234)
X <- rbind(mvrnorm(n=100, mu=c(1,-1), Sigma=diag(0.1,2)+0.9),
            mvrnorm(n=100, mu=c(1,1), Sigma=diag(0.1,2)+0.9))

lgaout <- lga(X,2)
plot(lgaout)
print(lgaout)

## Robust equivalent

rlgaout <- rlga(X,2, alpha=0.75)
plot(rlgaout)
print(rlgaout)


## nhl94 data set

data(nhl94)
plot(lga(nhl94, k=3, niter=30))


## Allometry data set
data(brain)
plot(lga(log(brain, base=10), k=3))


## Second Allometry data set
data(ob)
plot(lga(log(ob[,2:3]), k=3), pch=as.character(ob[,1]))

## Corridor Walls data set
## To obtain the results reported in Garcia-Escudero et al. (2008):
data(corridorWalls)
rlgaout <- rlga(corridorWalls, k=3, biter = 100, niter = 30, alpha=0.85)
pairs(corridorWalls, col=rlgaout$cluster+1)
plot(rlgaout)

## Parallel processing case
## In this example, running using 4 nodes. 

## Not run: 
set.seed(1234)
X <- rbind(mvrnorm(n=1e6, mu=c(1,-1), Sigma=diag(0.1,2)+0.9),
            mvrnorm(n=1e6, mu=c(1,1), Sigma=diag(0.1,2)+0.9))
abc <- lga(X, k=2, nnode=4)

## End(Not run)