vectorize_docs: Vectorize the documents

Description Usage Arguments Value Note See Also Examples

Description

Converts the documents read using read_docs into two vectors: one vector for the document word instances (contains vocabulary id's) and the other vector for the corresponding document id's.

Usage

1

Arguments

docs

a list of documents, which is created using read_docs

Value

A list of document and word instances

Note

This method is very time consuming for large datasets. Therefore, use functions such as lda_fgs_blei_corpus, which take docs as input and do the job of this function in the C++ programming langauge, for Gibbs sampling.

See Also

lda_fgs_blei_corpus

Other lda data preprocessing methods: calc_doc_lengths, read_docs

Examples

1
2
documents <- read_docs('bop.ldac');
ds <- vectorize_docs(documents);

clintpgeorge/clda documentation built on May 13, 2019, 8 p.m.