# Estimating Number of Principal Components Using the Auer-Gervini Method

### Description

Auer and Gervini [1] described a graphical Bayesian method for
estimating the number of statistically significant principal
components. We have implemented their method in the `AuerGervini`

class, and enhanced it by automating the final selection.

### Usage

1 2 | ```
AuerGervini(Lambda, dd=NULL, epsilon = 2e-16)
agDimension(object, agfun=agDimTwiceMean)
``` |

### Arguments

`Lambda` |
Either a |

`dd` |
A vector of length 2 containing the dimensions of the data
used to created the Auer-Gervini object. If |

`epsilon` |
A numeric value. Used to remove any variances that are
less than |

`object` |
An object of the |

`agfun` |
A function that takes one argument (a vector of step lengths) and returns a logical vector of the same length (where true indicates "long" as opposed to "short" steps). |

### Details

The Auer-Gervini method for determing the number of principal components is based on a Bayesian model that assaerts that the vector of explained variances (eigenvalues) should have the form

*a_1 ≤ a_2 ≤ … ≤ a_d < a_{d+1} = a_{d+2} = … a_n*

with the goal being to find the true dimension *d*. They consider
a set of prior distributions on *d \in \{1, …, n\}* that decay
exponentially, with the rate of decay controlled by a parameter
*θ*. For each value of *θ*, one selects the value
of *d* that has the maximum a posteriori (MAP) probability. Auer
and Gervini show that the dimensions selected by this procedure write
*d* as a non-increasing step function of *θ*. The values
of *θ* where the steps change are stored in the
`changePoints`

slot, and the corresponding *d*-values are
stored in the `dLevels`

slot.

Auer and Gervini go on to advise using their method as a graphical
approach, manually (or visually?) selecting the highest step that is
"long". Our implementation provides several different algorithms for
automatically deciding what is "long" enough. The simplest (but
fairly naive) approach is to take anything that is longer than twice
the mean; other algorithms are described in
`agDimFunction`

.

### Value

The `AuerGervini`

function constructs and returns an object of
the `AuerGervini`

class.

The `agDimension`

function computes the number of significant
principal components. The general idea is that one starts by
computing the length of each step in the Auer-Gerivni plot, and must
then separate these into "long" and "short" classes. We provide a
variety of different algorithms to carry out this process; the
default algorithm in the function `agDimTwiceMean`

defines
a step as "long" if it more than twice the mean step length.

### Objects from the Class

Objects should be created using the `AuerGervini`

constructor.

### Slots

`Lambda`

:A

`numeric`

vector containing the explained variances in decreasing order.`dimensions`

Numeric vector of length 2 containing the dimnesions of the underlying data matrix.

`dLevels`

:Object of class

`numeric`

; see details`changePoints`

:Object of class

`numeric`

; see details

### Methods

- plot
`signature(x = "AuerGervini", y = "missing")`

: ...- summary
`signature(object = "AuerGervini")`

: ...

### Author(s)

Kevin R. Coombes <krc@silicovore.com>

### References

[1] P Auer, D Gervini. Choosing principal components: a new graphical method based on Bayesian model selection. Communications in Statistics-Simulation and Computation 37 (5), 962-977

### See Also

`agDimFunction`

to get a complete list of the functions
implementing different algorithms to separate the step lengths into
two classes.

### Examples

1 2 3 4 5 6 7 8 9 10 11 | ```
showClass("AuerGervini")
# simulate variances
lambda <- rev(sort(diff(sort(c(0, 1, runif(9))))))
# apply the Auer-Gervini method
ag <- AuerGervini(lambda, dd=c(3,10))
# Review the results
summary(ag)
agDimension(ag)
agDimension(ag, agDimKmeans)
# Look at the results graphically
plot(ag, agfun=list(agDimTwiceMean, agDimKmeans))
``` |