collapse.bigrams: Replace specified bigrams with terms representing the bigrams
In kshirley/LDAtools: Tools to fit a topic model using Latent Dirichlet Allocation (LDA)

Description Usage Arguments Value

After tokenization, use this function to replace all occurrences of a given bigram with a single token representing the bigram, and 'delete' the occurrences of the two individual tokens that comprised the bigram (so that it is still a generative model for text).

1 2	collapse.bigrams(bigrams = character(), doc.id = integer(), term.id = integer(), vocab = character())

`bigrams`	A character vector, each element of which is a bigram represented by two terms separated by a hyphen, such as 'term1-term2'. Every consecutive occurrence of 'term1' and 'term2' in the data will be replaced by a single token representing this bigram.
`doc.id`	an interger vector containing the document ID number of every token in the corpus. Should take values between 1 and D, where D is the total number of documents in the corpus.
`term.id`	an integer vector containing the term ID number of every token in the corpus. Should take values between 1 and W, where W is the number of terms in the vocabulary.
`vocab`	a character vector of length W, containing the terms in the vocabulary. This vector must align with `term.id`, such that a term.id of 1 indicates the first element of `vocab`, a term.id of 2 indicates the second element of `vocab`, etc.

Returns a list of length three. The first element, new.vocab, is a character vector containing the new vocabulary. The second element, new.term.id is the new vector of term ID numbers for all tokens in the data, taking integer values from 1 to the length of the new vocabulary. The third element is new.doc.id, which is the new version of the document id vector. If any of the specified bigrams were present in the data, then new.term.id and new.doc.id will be shorter vectors than the original term.id and doc.id vectors.

kshirley/LDAtools documentation built on May 20, 2019, 7:03 p.m.

kshirley/LDAtools index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

kshirley/LDAtools
Tools to fit a topic model using Latent Dirichlet Allocation (LDA)

collapse.bigrams: Replace specified bigrams with terms representing the bigrams
In kshirley/LDAtools: Tools to fit a topic model using Latent Dirichlet Allocation (LDA)

Description

Usage

Arguments

Value

Related to collapse.bigrams in kshirley/LDAtools...

R Package Documentation

Browse R Packages

We want your feedback!

kshirley/LDAtools Tools to fit a topic model using Latent Dirichlet Allocation (LDA)

collapse.bigrams: Replace specified bigrams with terms representing the bigrams In kshirley/LDAtools: Tools to fit a topic model using Latent Dirichlet Allocation (LDA)

Description

Usage

Arguments

Value

Related to collapse.bigrams in kshirley/LDAtools...

R Package Documentation

Browse R Packages

We want your feedback!

kshirley/LDAtools
Tools to fit a topic model using Latent Dirichlet Allocation (LDA)

collapse.bigrams: Replace specified bigrams with terms representing the bigrams
In kshirley/LDAtools: Tools to fit a topic model using Latent Dirichlet Allocation (LDA)