DTM.add_ngrams: Find non-consecutive n-grams

View source: R/NLP.R

DTM.add_ngramsR Documentation

Find non-consecutive n-grams

Description

Build a term-term network using a cosine similarity measure built on the term co-presence in documents. A threshold defined in min.sim is used to identify edges. The maximal cliques of the network represent the discovered n-grams.

Usage

DTM.add_ngrams(DTM, min.sim = 0.5, max.terms = 10)

Arguments

DTM

A Document Term Matrix.

min.sim

The minimal cosine similarity that identifies an edge.

max.terms

The maximum size (i.e., the number of terms) in an n-gram.

Value

The same input Document Term Matrix with extra columns for the n-grams.


bakaburg1/BaySREn documentation built on March 30, 2022, 12:16 a.m.