study_AAconvergence: Function which studies the convergence of Archetypal Analysis...
In archetypal: Finds the Archetypal Analysis of a Data Frame

View source: R/study_AAconvergence.R View source: R/study_AAconvergence.R

study_AAconvergence

R Documentation

Function which studies the convergence of Archetypal Analysis when using the PCHA algorithm

Description

First it finds an AA solution under given arguments while storing all iteration history (save_history = TRUE). Then it computes the LOWESS [1] of SSE and its relevant UIK point [2]. Study is performed for iterations after that point. The list of B-matrices and archetypes that were found are stored. The archetypes are being aligned, while the B-matrices are used for computing the used rows-weights, leading rows-weights and maybe percentage of used rows on Convex Hull. The Aitken SSE extrapolation plus the relevant error are computed. The order and rate of convergence are estimated. Finally a multi-plot panel is being created if asked.

Usage

study_AAconvergence(df, kappas, method = "projected_convexhull", 
                    rseed = NULL, chvertices = NULL, plot = FALSE, ...)

Arguments

`df`	The data frame with dimensions n x d
`kappas`	The number of archetypes
`method`	The method that will be used for computing initial approximation: projected_convexhull, see `find_outmost_projected_convexhull_points` convexhull, see `find_outmost_convexhull_points` partitioned_convexhull, see `find_outmost_partitioned_convexhull_points` furthestsum, see `find_furthestsum_points` outmost, see `find_outmost_points` random, a random set of kappas points will be used
`rseed`	The random seed that will be used for setting initial A matrix. Useful for reproducible results.
`chvertices`	The vector of rows which represents the vertices for Convex Hull (if available)
`plot`	If it is TRUE, then a panel of useful plots is created
`...`	Other arguments to be passed to function `archetypal`, except `save_history` which must always be TRUE

Details

If we take natural logarithms at the next approximate equation

\epsilon_{n+1} = c\epsilon_{n}^p

for n = 1, 2, 3, \ldots, then we'll find

\log(\epsilon_{n+1}) = \log(c)+p\log(\epsilon_{n})

Thus a reasonable strategy for estimating order p and rate c is to perform a linear regression on above errors, after a selected iteration. That is the output of order_estimation and rate_estimation.

Value

A list with members:

SSE, a vector of all SSE from all AA iterations
SSE_lowess, a vector of LOWESS values for SSE
UIK_lowess, the UIK point [2] of SSE_lowess
aitken, a data frame of Aitken [3] extrapolation and error for SSE after UIK_lowess iteration
order_estimation, the last term in estimating order of convergence, page 56 of [4], by using SSE after UIK_lowess iteration
rate_estimation, the last term in estimating rate of convergence, page 56 of [4], by using SSE after UIK_lowess iteration
significance_estimations, a data frame with standard errors and statistical significance for estimations
used_on_convexhull, the % of used rows which lie on Convex Hull (if given), as a sequence for iterations after UIK_lowess one
aligned_archetypes, the archetypes after UIK_lowess iteration are being aligned by using align_archetypes_from_list. The history of archetypes creation.
solution_used, the AA output that has been used. Some times useful, especially for big data.

References

[1] Cleveland, W. S. (1979) Robust locally weighted regression and smoothing scatterplots. J. Amer. Statist. Assoc. 74, 829–836.

[2] Christopoulos, Demetris T., Introducing Unit Invariant Knee (UIK) As an Objective Choice for Elbow Point in Multivariate Data Analysis Techniques (March 1, 2016). Available at SSRN: http://dx.doi.org/10.2139/ssrn.3043076

[3] Aitken, A. "On Bernoulli's numerical solution of algebraic equations", Proceedings of the Royal Society of Edinburgh (1926) 46 pp. 289-305.

[4] Atkinson, K. E.,An Introduction to Numerical Analysis, Wiley & Sons,1989

Examples

{
# Load data "wd2"
data(wd2)
ch = chull(wd2)
sa = study_AAconvergence(df = wd2, kappas = 3, rseed = 20191119,
                         verbose = FALSE, chvertices = ch)
	names(sa)
	# [1] "SSE"                      "SSE_lowess"               "UIK_lowess"              
	# [4] "aitken"                   "order_estimation"         "rate_estimation"         
	# [7] "significance_estimations" "used_on_convexhull"       "aligned_archetypes"      
	# [10] "solution_used"        
	# sse=sa$SSE
	# ssel=sa$SSE_lowess
	sa$UIK_lowess
	# [1] 36
	# sa$aitken
	sa$order_estimation
	# [1] 1.007674
	sa$rate_estimation
	# [1] 0.8277613
	sa$significance_estimations
	#        estimation   std.error   t.value      p.value
	# log(c) -0.1890305 0.014658947 -12.89523 5.189172e-12
	# p       1.0076743 0.001616482 623.37475 3.951042e-50
	# sa$used_on_convexhull
	# sa$aligned_archetypes
	data.frame(sa$solution_used[c("SSE","varexpl","iterations","time")])
	#        SSE   varexpl iterations time
	# 1 1.717538 0.9993186         62 8.39
	# Plot class "study_AAconvergence"
	plot(sa)

}

archetypal documentation built on May 29, 2024, 8:46 a.m.