| agDimFunction | R Documentation |
Auer and Gervini developed a Bayesian graphical method to determine
the number d of significant principal components; a brief
overview is included in the help for the AuerGervini
class. The output of their method is a step function that displays
the maximum a posteriori (MAP) choice of d as a step function of
a one-parameter family of prior distributions, and they recommend
choosing the highest "long" step. The functions described here help
automate the process of dividing the step lengths into "long" and
"short" classes.
agDimTwiceMean(stepLength)
agDimKmeans(stepLength)
agDimKmeans3(stepLength)
agDimSpectral(stepLength)
agDimTtest(stepLength, extra=0)
agDimTtest2(stepLength)
agDimCPT(stepLength)
makeAgCpmFun(method)
stepLength |
A numeric vector |
method |
A character string describing a method supported by the
|
extra |
Just ignore this. Don't use it. It's a hack to avoid having to maintain two different versions of the same code. |
The agDimTwiceMean function implements a simple and naive rule:
a step is considered long if it as least twice the mean length.
The agDimKmeans uses the kmeans algorithm with
k=2 to divide the step lengths into two classes. Starting
centers for the groups are taken to be the minimum and maximum
values.
The agDimKmeans3 function uses kmeans with k=3,
using the median as the third center. Only one of the three groups is
considered "short".
The agDimSpectral applies spectral clustering (as implemented
by the specc function from the kernlab package)
to divide the steps lengths into two groups.
The agDimTtest and agDimTtest2 functions implement two
variants of a novel algorithm specialized for this particular task.
The idea is to start by sorting the step lengths so that
L_1 \le
L_2 \le \dots \le L_n.
Then, for each i \in 3,\dots, N-1, we
compute the mean and standard deviation of the first i step
lengths. Finally, one computes the likelhood that L_{i+1} comes
from the normal distribution defined by the first i lengths. If
the probability that L_{i+1} is larger is less than 0.01,
then it is chosen as the "smallest long step".
The novel method just described can also be viewed as a way to detect
a particular kind of change point. So, we also provide the
agDimCPT function that uses the changepoint detection
algorithm implement by the cpt.mean function in the
changepoint package. More generally, the makeAgCpmFun
allows you to use any of the changepoint models implemented as part
of the detectChangePointBatch function in the cpm
package.
Each of the functions agDimTwiceMean, agDimKmeans,
agDimKmeans3, agDimSpectral, agDimTtest,
agDimTtest2, and agDimCPT returns a logical vector whose
length is equal to the input stepLength. TRUE values
identify "long" steps and FALSE values identify "short"
steps.
The makeAgCpmFun returns a function that takes one argument (a
numeric stepLength vector) and returns a logical vector of the
same length.
Note: Our simulations suggest that "TwiceMean" and "CPM" give the best results.
Kevin R. Coombes <krc@silicovore.com>, Min Wang <wang.1807@osu.edu>.
P Auer, D Gervini. Choosing principal components: a new graphical method based on Bayesian model selection. Communications in Statistics-Simulation and Computation 37 (5), 962-977
The functions described here implerment different algorithms that can
be used by the agDimension function to automatically
compute the number of significant principal components based on the
AuerGervini approach. Several of these functions are
wrappers around functions defined in other packages, including
specc in the kernlab package,
cpt.mean in the changepoint package, and
detectChangePointBatch in the cpm package.
# simulate variances
lambda <- rev(sort(diff(sort(c(0, 1, runif(9))))))
# apply the Auer-Gervini method
ag <- AuerGervini(lambda, dd=c(3,10))
# Review the results
summary(ag)
agDimension(ag)
agDimension(ag, agDimKmeans)
agDimension(ag, agDimSpectral)
f <- makeAgCpmFun("Exponential")
agDimension(ag, f)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.