# dimcalc: Dimensionality Calculation Routines (LSA) In lsa: Latent Semantic Analysis

## Description

Methods for choosing a ‘good’ number of singular values for the dimensionality reduction in LSA.

## Usage

 ```1 2 3 4 5``` ``` dimcalc_share(share=0.5) dimcalc_ndocs(ndocs) dimcalc_kaiser() dimcalc_raw() dimcalc_fraction(frac=(1/50)) ```

## Arguments

 `share` Optional: a fraction of the sum of the selected singular values to the sum of all singular values (default: 0.5). Only needed by `dimcalc\_share`. `frac` Optional: a fraction of the number of the singular values to be used (default: 1/50th). `ndocs` Optional: the number of documents (only needed for `dimcalc\_ndocs()`).

## Details

In an LSA process, the diagonal matrix of the singular value decomposition is usually reduced to a specific number of dimensions (also ‘factors’ or ‘singular values’).

The functions `dimcalc\_share()`, `dimcalc\_ndocs()`, `dimcalc\_kaiser()` and also the redundant function `dimcalc\_raw()` offer methods to calculate a useful number of singular values (based on the distribution and values of the given sequence of singular values).

All of them are tightly coupled to the core LSA functions: they generates a function to be executed by the calling (higher-level) function `lsa()`. The output function contains only one parameter, namely `s`, which is expected to be the sequence of singular values. In `lsa()`, the code returned is executed, the mandatory singular values are provided as a parameter within `lsa()`.

The dimensionality calculation methods, however, can still be called directly by adding a second, separate parameter set: e.g.

`dimcalc\_share(share=0.2)(mysingularvalues)`

The method `dimcalc\_share()` finds the first position in the descending sequence of singular values `s` where their sum (divided by the sum of all values) meets or exceeds the specified share.

The method `dimcalc\_ndocs()` calculates the first position in the descending sequence of singular values where their sum meets or exceeds the number of documents.

The method `dimcalc\_kaiser()` calculates the number of singular values according to the Kaiser-Criterium, i.e. from the descending order of values all values with `s[n] > 1` will be taken. The number of dimensions is returned accordingly.

The method `dimcalc_fraction()` returns the specified share of the number of singular values. Per default, 1/50th of the available values will be used and the determined number of singular values will be returned.

The method `dimcalc\_raw()` return the maximum number of singular values (= the length of `s`). It is here only for completeness.

## Value

Returns a function that takes the singular values as a parameter to return the recommended number of dimensions. The expected parameter of this function is

 `s` A sequence of singular values (as produced by the SVD). Only needed when calling the dimensionality calculation routines directly.

## Author(s)

Fridolin Wild [email protected]

## References

Wild, F., Stahl, C., Stermsek, G., Neumann, G., Penya, Y. (2005) Parameters Driving Effectiveness of Automated Essay Scoring with LSA. In: Proceedings of the 9th CAA, pp.485-494, Loughborough

`lsa`
 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16``` ```## create some data vec1 = c( 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0 ) vec2 = c( 0, 0, 1, 1, 1, 1, 1, 0, 1, 0, 0, 0 ) vec3 = c( 0, 1, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0 ) matrix = cbind(vec1,vec2, vec3) s = svd(matrix)\$d # standard share of 0.5 dimcalc_share()(s) # specific share of 0.9 dimcalc_share(share=0.9)(s) # meeting the number of documents (here: 3) n = ncol(matrix) dimcalc_ndocs(n)(s) ```