tokenizers: A Consistent Interface to Tokenize Natural Language Text
Version 0.1.4

Convert natural language text into tokens. The tokenizers have a consistent interface and are compatible with Unicode, thanks to being built on the 'stringi' package. Includes tokenizers for shingled n-grams, skip n-grams, words, word stems, sentences, paragraphs, characters, lines, and regular expressions.

Package details

AuthorLincoln Mullen [aut, cre], Dmitriy Selivanov [ctb]
Date of publication2016-08-29 22:59:29
MaintainerLincoln Mullen <[email protected]>
LicenseMIT + file LICENSE
Package repositoryView on CRAN
Installation Install the latest version of this package by entering the following in R:

Try the tokenizers package in your browser

Any scripts or data that you put into this service are public.

tokenizers documentation built on May 30, 2017, 6:28 a.m.