macmillancontentscience/piecemaker: Tools for Preparing Text for Tokenizers

Tokenizers break text into pieces that are more usable by machine learning models. Many tokenizers share some preparation steps. This package provides those shared steps, along with a simple tokenizer.

Getting started

Package details

Maintainer
LicenseApache License (>= 2)
Version1.0.2.9000
URL https://github.com/macmillancontentscience/piecemaker https://macmillancontentscience.github.io/piecemaker/
Package repositoryView on GitHub
Installation Install the latest version of this package by entering the following in R:
install.packages("remotes")
remotes::install_github("macmillancontentscience/piecemaker")
macmillancontentscience/piecemaker documentation built on July 1, 2023, 8:12 p.m.