Description Usage Arguments Details Value Note Author(s) References See Also Examples
Given a multivariate data set this function applies exploratory projection pursuit to find an interesting threedimensional projection of the input data.
1 
xm 
Data matrix. Note: ROWS correspond to variables, and so the input matrix here is (probably) the transpose of what you might expect (and is accepted by principal components functions). This might change in the future. 
nrandstarts 
Number of random starts. 
lapplyfn 
By default this is lapply. If you can use the parallel
package then you can replace this with mclapply to get a speed up.
The latter is faster if you know how to
use the parallel package and you often need to set the

action 
Integer taking the value 0, 1, 2 or 3. This controls how multivariate outliers are treated. If 0, then no action is taken, the multivariate set is supplied unmolested to the projection pursuit routine. If 1, then outliers are removed. If 2,3 , then outliers are moved towards the centre according to log or square root trimming (the precise formulae are described in the trimsu FORTRAN code, and stem from Jones and Sibson (1987). The outliers are only removed, or trimmed, for the purposes of index calculation. They are retained in the final solution and projected according to the projection direction decided by the projection pursuit algorithm computed on the nonoutlier (majority) portion of the data. 
limit 
This argument is only used if 
text 
Integer. Has no effect if set to zero. If set to one the FORTRAN code will print out informational messages. 
Exploratory projection pursuit is a method introduced by Friedman and Tukey (1974) which projects multivariate Kdimensional data down onto a Ldimensional subspace, using a projection A (an LxKdimensional matrix). A projection index, I, is devised to measure how interesting the projection is and hence usually I=I(A). A numerical optimiser is then employed to find the projection A that maximises the index of interestingness.
A goal of exploratory projection pursuit is to find projections that highlight interesting structure and clustering. Often, this method can discover interesting structures that existing methods miss, as it uses a different measure of interestingness.
As with many other optimisation problems of this sort the numerical optimiser does not find the global maximum, but a local maximum. In any case, the global maximum might not correspond to the only interesting projection direction, so it is worth exploring many projections that result in large projection indices. One way of obtaining these is to run projection pursuit from several random starting directions. Although this does not guarantee to find all interesting local optima, it will find many of them.
This code implements the threedimensional version of the moment projection index proposed by Jones and Sibson, (1987). Description of the threedimensional version can be found in Nason (1995). This version first centres the data (removes its mean), and then spheres it (transforms its covariance matrix to the identity). The aim of sphering is to ensure that any structure discovered by projection pursuit is not related to anything that could be found by principal components analysis (because principal components "looks for structure" contained in the covariance matrix, and there will not be any if the covariance matrix is the identity). The moment index is an approximation to the entropy index, which looks for departures in the empirical distribution of the projected data from standard normality. The heuristic is that anything that is far from normal is â€˜interestingâ€™. Hence, clustered, bi, tri or multimodal is not normal and hence interesting and so clustered data is deemed to be interesting by the moment index. The moment index has been criticised for being sensitive to outliers. More discussion of that, and designs of robust indices can be found in Nason (2001). However, recent numerical experimentation shows that the moment index indeed does find projections with outliers, but these can be easily and quickly discounted; and the method does routinely find interesting projections, when they are present.
A PP3
class object, which is a list with the following components.
ix3 
A vector containing the optimised projection index correspond
to each of the 
info 
A list of 
pdata.list 
The data projected according to the optimal projection
direction resulting from each random start.
A list containing 
pseudp.vals 
It can be hard to evaluate a given optimal projection
index value and know whether it can be judged as large. One
way of assessing it is to compare it to a set of projection
indices computed (without optimisation) on another set of
random projection directions. It is known that most arbitrary
projections result in uninteresting projections, so a collection
of 
origvarnames 
The names of the original variables (names of rows)
from the original data matrix. This is taken from the

Nearly all of PP3 was written by Guy Nason for his PhD (1992). Nason is grateful too, and worked under the guidance of the late, great, Robin Sibson. Robin also wrote a carefully written eigendecomposition subroutine, which is part of this package.
G. P. Nason
Friedman, J.H. and Tukey, J.W. (1974) A projection pursuit algorithm for exploratory data analysis. IEEE Trans. Comput., 23, 881890.
Jones, M.C. and Sibson, R. (1987) What is projection pursuit? (with discussion) J. R. Statist. Soc. A, 150, 136.
Nason, G. P. (1995) Threedimensional projection pursuit. J. R. Statist. Soc. C, 44, 411430.
Nason, G. P. (2001) Robust projection indices. J. R. Statist. Soc. B, 63, 551567.
getPP3index
,getPP3loadings
,
getPP3projdata
,plot.PP3
,
print.PP3
, summary.PP3
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101  #
# The flea beetle data
#
data(beetle)
#
# Run projection pursuit with 100 random starts (normally, you'd use MANY
# more random starts, e.g. 1000 or more. Here, we keep the number small to
# help CRAN
#
#
# N.b. I am going to set.seed here, so results match what you might see
# when trying THESE functions, but, in general, you can ignore set.seed
# or set it to your favourite value
#
set.seed(1)
beetle.PP3 < PP3many(t(beetle), nrandstarts=100)
#
# Look at the output
#
beetle.PP3
#Class 'PP3' : Threedimensional Projection Pursuit Object:
# ~~~ : List with 5 components with names
# ix3 info pdata.list pseudp.vals origvarnames
#
#Number of random start(s): 100
#Maximum projection index is 22.02255 achieved by 1 random start(s).
#(Partial) list of those starts achieving max are: 90
#
#summary(.):
#
# Summary statistics of projection index
# Min. 1st Qu. Median Mean 3rd Qu. Max.
# 11.51 14.81 16.18 16.34 17.89 22.02
# Summary statistics of pseudo pvalues
# Min. 1st Qu. Median Mean 3rd Qu. Max.
# 8.592 10.885 12.361 12.466 13.992 18.210
#
# The print out shows that 100 random starts were executed and the max
# projection index was 22.02255 and only one of those random starts found
# this (sometimes more than one random start converges to the same maximum).
#
# The index number of the run which found the maximum was 90 (this number
# can be useful later to access the maximum).
#
# The summary gives the summary statistics of the 100 projection index
# values found. The max is the same as above, but also the distribution
# can be discerned.
#
# The distribution of the pseudopvalues (NOT actual pvalues) is presented
# after that. These are the projection indices computed purely on random
# directions, not the optimised versions and so you can think of them as
# null values to compare the earlier optimised values. E.g. the maximum of
# the pseudoprojection indices is 18.21, so any actual optimised projection
# index larger than this might be interesting.
#
# Now produce a plot (using all projection index info on 100 runs):
#
## Not run: plot(beetle.PP3)
#
# This produces (a) a histogram of the projection indices (b) a red density
# estimate of the pseudoprojection indices and (c) the median, upper quartile,
# 0.9 quantile and maximum of the pseudos as red dotted vertical lines. The red
# information corresponds to a kind of null, and so projection indices larger
# than these values might be interesting. The plot also produces some text:
#
#Big Projection Indices
#Maximum Psuedo pvalue: 18.2103
#Index Number and associated projection indices
# 90 74 4 87 75 54 13 6
#22.02255 21.36578 21.29397 20.86531 19.59663 19.42427 19.34596 19.26520
# 23 60
#19.22810 19.16459
#
# This is a list of the 10 biggest projection indices and their respective
# identity numbers (which one of the random starts generated it). These
# can be used in the plot function with a number argument to generate further
# information/plots about the projection solution. Note, the number of
# biggest projection indices can be controlled with the nbig argument of
# plot.PP3.
#
# Now suppose we wanted to look at the projection solution 74, which had the
# secondbiggest projection index. We can plot the projected data with the
# following command:
#
## Not run: plot(beetle.PP3, number=74, colvec=dimnames(beetle)[[1]])
#
# The colvec supplies the group structure so the different species can
# be coloured differently. The label argument permits you to put text
# labels there, per point as well as colours.
#
# You can extract information from the beetle.PP3 object using the
# extractor functions, getPP3index, getPP3projdata and the variable loadings
# using getPP3loading. For example,
#
getPP3index(beetle.PP3, 74)
# 74
#21.36578
#
# gets the second largest projection index. The thirdlargest can be obtained
# by replacing 74 by 4, etc.

Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.