corpusCaDlg: Correspondence analysis from a tm corpus

Description Details See Also

Description

Compute a simple correspondence analysis on the document-term matrix of a tm corpus.

Details

This dialog wraps the runCorpusCa function. The function runCorpusCa runs a correspondence analysis (CA) on the document-term matrix.

If no variable is selected in the list (the default), a CA is run on the full document-term matrix (possibly skipping sparse terms, see below). If one or more variables are chosen, the CA will be based on a stacked table whose rows correspond to the levels of the variable: each cell contains the sum of occurrences of a given term in all the documents of the level. Documents that contain a NA are skipped for this variable, but taken into account for the others, if any.

In all cases, variables that have not been selected are added as supplementary rows. If at least one variable is selected, documents are also supplementary rows, while they are active otherwise.

The first slider ('sparsity') allows skipping less significant terms to use less memory, especially with large corpora. The second slider ('dimensions to retain') allows choosing the number of dimensions that will be printed, but has no effect on the computation of the correspondance analysis.

See Also

runCorpusCa, ca, meta, removeSparseTerms, DocumentTermMatrix


RcmdrPlugin.temis documentation built on May 2, 2019, 5:21 p.m.