foreign: Read and Write Sparse Matrix Format Files

foreignR Documentation

Read and Write Sparse Matrix Format Files

Description

Read and write CLUTO sparse matrix format files, or the CCS format variant employed by the MC toolkit.

Usage

read_stm_CLUTO(file)
write_stm_CLUTO(x, file)
read_stm_MC(file, scalingtype = NULL)
write_stm_MC(x, file)

Arguments

file

a character string with the name of the file to read or write.

x

a matrix object.

scalingtype

a character string specifying the type of scaling to be used, or NULL (default), in which case the scaling will be inferred from the names of the files with non-zero entries found (see Details).

Details

Documentation for CLUTO including its sparse matrix format used to be available from ‘⁠https://www-users.cse.umn.edu/~karypis/cluto/⁠’.

read_stm_CLUTO reads CLUTO sparse matrices, returning a simple triplet matrix.

write_stm_CLUTO writes CLUTO sparse matrices. Argument x must be coercible to a simple triplet matrix via as.simple_triplet_matrix.

MC is a toolkit for creating vector models from text documents (see https://www.cs.utexas.edu/~dml/software/mc/). It employs a variant of Compressed Column Storage (CCS) sparse matrix format, writing data into several files with suitable names: e.g., a file with ‘_dim’ appended to the base file name stores the matrix dimensions. The non-zero entries are stored in a file the name of which indicates the scaling type used: e.g., ‘_tfx_nz’ indicates scaling by term frequency (‘⁠t⁠’), inverse document frequency (‘⁠f⁠’) and no normalization (‘⁠x⁠’). See ‘README’ in the MC sources for more information.

read_stm_MC reads such sparse matrix information with argument file giving the path with the base file name, and returns a simple triplet matrix.

write_stm_MC writes matrices in MC CCS sparse matrix format. Argument x must be coercible to a simple triplet matrix via as.simple_triplet_matrix.


slam documentation built on Oct. 15, 2024, 9:09 a.m.

Related to foreign in slam...