mmcorpus_serialize: Serialise Matrix Market Corpus

Description Usage Arguments Details Value Functions Examples

Description

Serialise a term-document matrix to disk.

Usage

1
2
3
4
5

Arguments

corpus

A corpus as returned by doc2bow.

file

Path to a .mm file (recommended), if NULL it is saved to a temp file.

auto_delete

Wether to automatically delete the temp file after first use.

Details

Serialize the corpus to disk in order to take advantage of Python's file scan efficiency.

Value

An object of class mm_file which holds the path to the file and metadata.

Functions

Examples

1
2
3
4
5
6
7
docs <- prepare_documents(corpus)
dict <- corpora_dictionary(docs)
corpora <- doc2bow(dict, docs)

# serialize and delete
## Not run: corpus_mm <- serialize_mmcorpus(corpora)
## Not run: delete_mmcorpus(corpus_mm)

news-r/gensimr documentation built on Jan. 9, 2021, 5:55 a.m.