study_AAconvergence: Function which studies the convergence of Archetypal Analysis... In archetypal: Finds the Archetypal Analysis of a Data Frame

 study_AAconvergence R Documentation

Function which studies the convergence of Archetypal Analysis when using the PCHA algorithm

Description

First it finds an AA solution under given arguments while storing all iteration history (save_history = TRUE). Then it computes the LOWESS [1] of SSE and its relevant UIK point [2]. Study is performed for iterations after that point. The list of B-matrices and archetypes that were found are stored. The archetypes are being aligned, while the B-matrices are used for computing the used rows-weights, leading rows-weights and maybe percentage of used rows on Convex Hull. The Aitken SSE extrapolation plus the relevant error are computed. The order and rate of convergence are estimated. Finally a multi-plot panel is being created if asked.

Usage

study_AAconvergence(df, kappas, method = "projected_convexhull",
rseed = NULL, chvertices = NULL, plot = FALSE, ...)


Arguments

 df The data frame with dimensions n x d kappas The number of archetypes method The method that will be used for computing initial approximation: projected_convexhull, see find_outmost_projected_convexhull_points convexhull, see find_outmost_convexhull_points partitioned_convexhull, see find_outmost_partitioned_convexhull_points furthestsum, see find_furthestsum_points outmost, see find_outmost_points random, a random set of kappas points will be used rseed The random seed that will be used for setting initial A matrix. Useful for reproducible results. chvertices The vector of rows which represents the vertices for Convex Hull (if available) plot If it is TRUE, then a panel of useful plots is created ... Other arguments to be passed to function archetypal, except save_history which must always be TRUE

Details

If we take natural logarithms at the next approximate equation

\epsilon_{n+1} = c\epsilon_{n}^p

for n = 1, 2, 3, \ldots, then we'll find

\log(\epsilon_{n+1}) = \log(c)+p\log(\epsilon_{n})

Thus a reasonable strategy for estimating order p and rate c is to perform a linear regression on above errors, after a selected iteration. That is the output of order_estimation and rate_estimation.

Value

A list with members:

1. SSE, a vector of all SSE from all AA iterations

2. SSE_lowess, a vector of LOWESS values for SSE

3. UIK_lowess, the UIK point [2] of SSE_lowess

4. aitken, a data frame of Aitken [3] extrapolation and error for SSE after UIK_lowess iteration

5. order_estimation, the last term in estimating order of convergence, page 56 of [4], by using SSE after UIK_lowess iteration

6. rate_estimation, the last term in estimating rate of convergence, page 56 of [4], by using SSE after UIK_lowess iteration

7. significance_estimations, a data frame with standard errors and statistical significance for estimations

8. used_on_convexhull, the % of used rows which lie on Convex Hull (if given), as a sequence for iterations after UIK_lowess one

9. aligned_archetypes, the archetypes after UIK_lowess iteration are being aligned by using align_archetypes_from_list. The history of archetypes creation.

10. solution_used, the AA output that has been used. Some times useful, especially for big data.

References

[1] Cleveland, W. S. (1979) Robust locally weighted regression and smoothing scatterplots. J. Amer. Statist. Assoc. 74, 829–836.

[2] Christopoulos, Demetris T., Introducing Unit Invariant Knee (UIK) As an Objective Choice for Elbow Point in Multivariate Data Analysis Techniques (March 1, 2016). Available at SSRN: http://dx.doi.org/10.2139/ssrn.3043076

[3] Aitken, A. "On Bernoulli's numerical solution of algebraic equations", Proceedings of the Royal Society of Edinburgh (1926) 46 pp. 289-305.

[4] Atkinson, K. E.,An Introduction to Numerical Analysis, Wiley & Sons,1989

check_Bmatrix

Examples

{
data(wd2)
ch = chull(wd2)
sa = study_AAconvergence(df = wd2, kappas = 3, rseed = 20191119,
verbose = FALSE, chvertices = ch)
names(sa)
# [1] "SSE"                      "SSE_lowess"               "UIK_lowess"
# [4] "aitken"                   "order_estimation"         "rate_estimation"
# [7] "significance_estimations" "used_on_convexhull"       "aligned_archetypes"
# [10] "solution_used"
# sse=sa$SSE # ssel=sa$SSE_lowess
sa$UIK_lowess # [1] 36 # sa$aitken
sa$order_estimation # [1] 1.007674 sa$rate_estimation
# [1] 0.8277613
sa$significance_estimations # estimation std.error t.value p.value # log(c) -0.1890305 0.014658947 -12.89523 5.189172e-12 # p 1.0076743 0.001616482 623.37475 3.951042e-50 # sa$used_on_convexhull
# saaligned_archetypes data.frame(sasolution_used[c("SSE","varexpl","iterations","time")])
#        SSE   varexpl iterations time
# 1 1.717538 0.9993186         62 8.39
# Plot class "study_AAconvergence"
plot(sa)

}


archetypal documentation built on May 29, 2024, 8:46 a.m.