wordcounts_texts: Convert long-format word-counts into documents
In agoldst/dfrtopics: Tools for exploring topic models of text

wordcounts_texts

R Documentation

Convert long-format word-counts into documents

Description

This naively "inflates" word counts into a bag of words, for sending to MALLET.

Usage

wordcounts_texts(counts, shuffle = FALSE, sep = " ")

Arguments

`counts`	long-format data frame like that returned by `read_wordcounts`
`shuffle`	if `TRUE`, randomize word order within document before pasting it together. `FALSE` by default.
`sep`	word separator in inflated bags. A space, by default.

Details

You can directly pass the result from link{read_wordcounts} to this function, but normally you'll want to filter or otherwise manipulate the words first.

It is not straightforward to supply feature vectors directly to MALLET; MALLET really wants to featurize each text itself. So our task is to take the wordcounts supplied from DfR and reassemble the texts. If DfR tells us word w occurs N times, we simply paste N copies of w together, separated by spaces (or the value of sep if given). Though LDA should not care about word order, if you are nervous about the effects of the decidedly non-natural ordering of words this produces on the modeling process, you can randomize the word order (it still won't be natural). Thanks to David Mimno for suggesting this via his own mallet code.

A big waste of memory, but this is the simple way to get DfR files into MALLET.

Value

a dataframe with two columns: id, the document id; text, the full document text as a single line (with the words in meaningless order)

agoldst/dfrtopics
Tools for exploring topic models of text

wordcounts_texts: Convert long-format word-counts into documents
In agoldst/dfrtopics: Tools for exploring topic models of text

Convert long-format word-counts into documents

Description

Usage

Arguments

Details

Value

See Also

Related to wordcounts_texts in agoldst/dfrtopics...

R Package Documentation

Browse R Packages

We want your feedback!

agoldst/dfrtopics Tools for exploring topic models of text

wordcounts_texts: Convert long-format word-counts into documents In agoldst/dfrtopics: Tools for exploring topic models of text

Convert long-format word-counts into documents

Description

Usage

Arguments

Details

Value

See Also

Related to wordcounts_texts in agoldst/dfrtopics...

R Package Documentation

Browse R Packages

We want your feedback!

agoldst/dfrtopics
Tools for exploring topic models of text

wordcounts_texts: Convert long-format word-counts into documents
In agoldst/dfrtopics: Tools for exploring topic models of text