textmodel_ca: Correspondence analysis of a document-feature matrix

Description Usage Arguments Details Value Note Author(s) References See Also Examples

View source: R/textmodel_ca.R

Description

textmodel_ca implements correspondence analysis scaling on a dfm. The method is a fast/sparse version of function ca.

Usage

1
2
textmodel_ca(x, smooth = 0, nd = NA, sparse = FALSE,
  residual_floor = 0.1)

Arguments

x

the dfm on which the model will be fit

smooth

a smoothing parameter for word counts; defaults to zero.

nd

Number of dimensions to be included in output; if NA (the default) then the maximum possible dimensions are included.

sparse

retains the sparsity if set to TRUE; set it to TRUE if x (the dfm) is too big to be allocated after converting to dense

residual_floor

specifies the threshold for the residual matrix for calculating the truncated svd.Larger value will reduce memory and time cost but might reduce accuracy; only applicable when sparse = TRUE

Details

svds in the RSpectra package is applied to enable the fast computation of the SVD.

Value

textmodel_ca() returns a fitted CA textmodel that is a special class of ca object.

Note

You may need to set sparse = TRUE) and increase the value of residual_floor to ignore less important information and hence to reduce the memory cost when you have a very big dfm. If your attempt to fit the model fails due to the matrix being too large, this is probably because of the memory demands of computing the V \times V residual matrix. To avoid this, consider increasing the value of residual_floor by 0.1, until the model can be fit.

Author(s)

Kenneth Benoit and Haiyan Wang

References

Nenadic, O. and Greenacre, M. (2007). Correspondence analysis in R, with two- and three-dimensional graphics: The ca package. Journal of Statistical Software, 20 (3), http://www.jstatsoft.org/v20/i03/.

See Also

coef.textmodel_lsa, ca

Examples

1
2
3
ieDfm <- dfm(data_corpus_irishbudget2010)
wca <- textmodel_ca(ieDfm)
summary(wca)

Example output

quanteda version 0.99
Using 2 of 1 threads for parallel computing

Attaching package: 'quanteda'

The following object is masked from 'package:utils':

    View

           Length Class  Mode     
sv             7  -none- numeric  
nd             1  -none- numeric  
rownames      14  -none- character
rowmass       14  -none- numeric  
rowdist       14  -none- numeric  
rowinertia    14  -none- numeric  
rowcoord      98  -none- numeric  
rowsup         0  -none- logical  
colnames    5140  -none- character
colmass     5140  -none- numeric  
coldist     5140  -none- numeric  
colinertia  5140  -none- numeric  
colcoord   35980  -none- numeric  
colsup         0  -none- logical  
call           2  -none- call     

quanteda documentation built on Nov. 2, 2018, 1:05 a.m.