NUSS: Mixed N-Grams and Unigram Sequence Segmentation

Segmentation of short text sequences - like hashtags - into the separated words sequence, done with the use of dictionary, which may be built on custom corpus of texts. Unigram dictionary is used to find most probable sequence, and n-grams approach is used to determine possible segmentation given the text corpus.

Getting started

Package details

AuthorOskar Kosch [aut, cre] (<https://orcid.org/0000-0003-2697-1393>)
MaintainerOskar Kosch <contact@oskarkosch.com>
LicenseGPL (>= 3)
Version0.1.0
URL https://github.com/theogrost/NUSS
Package repositoryView on CRAN
Installation Install the latest version of this package by entering the following in R:
install.packages("NUSS")

Try the NUSS package in your browser

Any scripts or data that you put into this service are public.

NUSS documentation built on Sept. 11, 2024, 5:30 p.m.