Divide Steps into "Long" and "Short" to Compute AuerGervini Dimension
Description
Auer and Gervini developed a Bayesian graphical method to determine
the number d of significant principal components; a brief
overview is included in the help for the AuerGervini
class. The output of their method is a step function that displays
the maximum a posteriori (MAP) choice of d as a step function of
a oneparameter family of prior distributions, and thbey recommend
choosing the higest "long" step. The functions described here help
automate the process of dividing the step lengths into "long" and
"short" classes.
Usage
1 2 3 4 5 6 7 8  agDimTwiceMean(stepLength)
agDimKmeans(stepLength)
agDimKmeans3(stepLength)
agDimSpectral(stepLength)
agDimTtest(stepLength, extra=0)
agDimTtest2(stepLength)
agDimCPT(stepLength)
makeAgCpmFun(method)

Arguments
stepLength 
A numeric vector 
method 
A character string describing a method supported by the

extra 
Just ignore this. Don't use it. It's a hack to avoid having to maintain two differfent versions of the same code. 
Details
The agDimTwiceMean
function implements a simple and naive rule:
a step is considered long if it as least twice the mean length.
The agDimKmeans
uses the kmeans
algorithm with
k=2 to divide the step lengths into two classes. Starting
centers for the groups are taken to be the minimum and maximum
values.
The agDimKmeans3
function uses kmeans
with k=3,
using the median as the third center. Only one of the three groups is
considered "short".
The agDimSpectral
applies spectral clustering (as implemented
by the specc
function from the kernlab
package)
to divide the steps lengths into two groups.
TheagDimTtest
and agDimTtest2
functions implement two
variants of a novel algorithm specialized for this particular task.
The idea is to start by sorting the step lengths so that
L_1 ≤ L_2 ≤ … ≤ L_n.
Then, for each i \in 3,…, N1, we compute the mean and standard deviation of the first i step lengths. Finally, one computes the likelhood that L_{i+1} comes from the normal distribution defined by the first i lengths. If the probability that L_{i+1} is larger is less than 0.01, then it is chosen as the "smallest long step".
The novel method just described can also be viewed as a way to detect
a particular kind of change point. So, we also provide the
agDimCPT
function that uses the changepoint detection
algorithm implement by the cpt.mean
function in the
changepoint
package. More generally, the makeAgCpmFun
allows you to use any of the changepoint models implemented as part
of the detectChangePointBatch
function in the cpm
package.
Value
Each of the functions agDimTwiceMean
, agDimKmeans
,
agDimKmeans3
, agDimSpectral
, agDimTtest
,
agDimTtest2
, and agDimCPT
returns a logical vector whose
length is equal to the input stepLength
. TRUE
values
identify "long" steps and FALSE
values identify "short"
steps.
The makeAgCpmFun
returns a function that takes one argument (a
numeric stepLength
vector) and returns a logical vector of the
same length.
Author(s)
Kevin R. Coombes <krc@silicovore.com>, Min Wang <wang.1807@osu.edu>.
References
P Auer, D Gervini. Choosing principal components: a new graphical method based on Bayesian model selection. Communications in StatisticsSimulation and Computation 37 (5), 962977
See Also
The functions described here implerment different algorithms that can
be used by the agDimension
function to automatically
compute the number of significant principal components based on the
AuerGervini
approach. Several of these functions are
wrappers around functions defined in other packages, including
specc
in the kernlab
package,
cpt.mean
in the changepoint
package, and
detectChangePointBatch
in the cpm
package.
Examples
1 2 3 4 5 6 7 8 9 10 11  # simulate variances
lambda < rev(sort(diff(sort(c(0, 1, runif(9))))))
# apply the AuerGervini method
ag < AuerGervini(lambda, dd=c(3,10))
# Review the results
summary(ag)
agDimension(ag)
agDimension(ag, agDimKmeans)
agDimension(ag, agDimSpectral)
f < makeAgCpmFun("Exponential")
agDimension(ag, f)
