README.md
In paithiov909/ldccr: Utilities for Various Japanese Corpora

ldccr

ldccr is utilities for various Japanese corpora.

The goal of ldccr package is to make easy to use Japanese language resources.

This package provides:

parsers for several Japanese corpora that are free or open licensed (non proprietary).
a downloader of zipped text files published on Aozora Bunko.

install.packages("ldccr", repos = c("https://paithiov909.r-universe.dev", "https://cloud.r-project.org"))

| … | Name | License | Link | | -------------------- | -------------------------------------------- | --------------------------------------------------------------------- | ------------------------------------------------ | | :heavy_check_mark: | Live Door News Corpus | CC BY-ND 2.1 JP | # | | :heavy_check_mark: | Japanese Realistic Textual Entailment Corpus | CC BY-NC-SA 4.0 | # | | :heavy_check_mark: | ja.text8 corpus | CC BY-SA | # |

Currently not supported.

if (!dir.exists("cache")) dir.create("cache")

text <- ldccr::AozoraBunkoSnapshot |>
  dplyr::sample_n(1L) |>
  dplyr::pull("テキストファイルURL") |>
  ldccr::read_aozora(directory = "cache") |>
  readr::read_lines()

dplyr::glimpse(text)
#>  chr [1:16] "雪子さんの泥棒よけ" "夢野久作" ...

MIT license.

paithiov909/ldccr documentation built on Oct. 14, 2024, 3:44 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Tweet to @rdrrHQ

GitHub issue tracker

ian@mutexlabs.com