piecemaker: Tools for Preparing Text for Tokenizers

Tokenizers break text into pieces that are more usable by machine learning models. Many tokenizers share some preparation steps. This package provides those shared steps, along with a simple tokenizer.

Package details

AuthorJon Harmon [aut, cre] (<https://orcid.org/0000-0003-4781-4346>), Jonathan Bratt [aut] (<https://orcid.org/0000-0003-2859-0076>), Bedford Freeman & Worth Pub Grp LLC DBA Macmillan Learning [cph]
MaintainerJon Harmon <jonthegeek@gmail.com>
LicenseApache License (>= 2)
Version1.0.2
URL https://github.com/macmillancontentscience/piecemaker https://macmillancontentscience.github.io/piecemaker/
Package repositoryView on CRAN
Installation Install the latest version of this package by entering the following in R:
install.packages("piecemaker")

Try the piecemaker package in your browser

Any scripts or data that you put into this service are public.

piecemaker documentation built on June 7, 2023, 5:55 p.m.