dimcalc | R Documentation |
Methods for choosing a ‘good’ number of singular values for the dimensionality reduction in LSA.
dimcalc_share(share=0.5) dimcalc_ndocs(ndocs) dimcalc_kaiser() dimcalc_raw() dimcalc_fraction(frac=(1/50))
share |
Optional: a fraction of the sum of the selected singular values to the sum of all singular values (default: 0.5). Only needed by |
frac |
Optional: a fraction of the number of the singular values to be used (default: 1/50th). |
ndocs |
Optional: the number of documents (only needed for |
In an LSA process, the diagonal matrix of the singular value decomposition is usually reduced to a specific number of dimensions (also ‘factors’ or ‘singular values’).
The functions dimcalc\_share()
, dimcalc\_ndocs()
, dimcalc\_kaiser()
and also the redundant function dimcalc\_raw()
offer methods to calculate a useful
number of singular values (based on the distribution and values of the given sequence
of singular values).
All of them are tightly coupled to the core LSA functions: they generates
a function to be executed by the calling (higher-level)
function lsa()
. The output function contains only one parameter,
namely s
, which is expected to be the sequence of singular values.
In lsa()
, the code returned is executed, the mandatory
singular values are provided as a parameter within lsa()
.
The dimensionality calculation methods, however, can still be called directly by adding a second, separate parameter set: e.g.
dimcalc\_share(share=0.2)(mysingularvalues)
The method dimcalc\_share()
finds the first position in the descending sequence of
singular values s
where their sum (divided by the sum of all
values) meets or exceeds the specified share.
The method dimcalc\_ndocs()
calculates the first position in the descending sequence
of singular values where their sum meets or exceeds the number of documents.
The method dimcalc\_kaiser()
calculates the number of singular values according to the
Kaiser-Criterium, i.e. from the descending order of values all values
with s[n] > 1
will be taken. The number of dimensions is returned
accordingly.
The method dimcalc_fraction()
returns the specified share of the
number of singular values. Per default, 1/50th of the available values
will be used and the determined number of singular values will be returned.
The method dimcalc\_raw()
return the maximum number of singular values (= the length
of s
). It is here only for completeness.
Returns a function that takes the singular values as a parameter to return the recommended number of dimensions. The expected parameter of this function is
s |
A sequence of singular values (as produced by the SVD). Only needed when calling the dimensionality calculation routines directly. |
Fridolin Wild f.wild@open.ac.uk
Wild, F., Stahl, C., Stermsek, G., Neumann, G., Penya, Y. (2005) Parameters Driving Effectiveness of Automated Essay Scoring with LSA. In: Proceedings of the 9th CAA, pp.485-494, Loughborough
lsa
## create some data vec1 = c( 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0 ) vec2 = c( 0, 0, 1, 1, 1, 1, 1, 0, 1, 0, 0, 0 ) vec3 = c( 0, 1, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0 ) matrix = cbind(vec1,vec2, vec3) s = svd(matrix)$d # standard share of 0.5 dimcalc_share()(s) # specific share of 0.9 dimcalc_share(share=0.9)(s) # meeting the number of documents (here: 3) n = ncol(matrix) dimcalc_ndocs(n)(s)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.