audubon: Japanese Text Processing Tools

A collection of Japanese text processing tools for filling Japanese iteration marks, Japanese character type conversions, segmentation by phrase, and text normalization which is based on rules for the 'Sudachi' morphological analyzer and the 'NEologd' (Neologism dictionary for 'MeCab'). These features are specific to Japanese and are not implemented in 'ICU' (International Components for Unicode).

Package details

AuthorAkiru Kato [cre, aut], Koki Takahashi [cph] (Author of japanese.js), Shuhei Iitsuka [cph] (Author of budoux), Taku Kudo [cph] (Author of TinySegmenter)
MaintainerAkiru Kato <paithiov909@gmail.com>
LicenseApache License (>= 2)
Version0.5.2
URL https://github.com/paithiov909/audubon https://paithiov909.github.io/audubon/
Package repositoryView on CRAN
Installation Install the latest version of this package by entering the following in R:
install.packages("audubon")

Try the audubon package in your browser

Any scripts or data that you put into this service are public.

audubon documentation built on May 29, 2024, noon