knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
# set.seed(289)
pkgload::load_all()

ldccr

ldccr status badge

Overview

ldccr is utilities for various Japanese corpora.

The goal of ldccr package is to make easy to use Japanese language resources.

This package provides:

  1. parsers for several Japanese corpora that are free or open licensed (non proprietary).
  2. a downloader of zipped text files published on Aozora Bunko.

Installation

install.packages("ldccr", repos = c("https://paithiov909.r-universe.dev", "https://cloud.r-project.org"))

Supported Corpora

Monolingual

| ... | Name | License | Link | | --- | ---- | ------- | ---- | | :heavy_check_mark: | Live Door News Corpus | CC BY-ND 2.1 JP | # | | :heavy_check_mark: | Japanese Realistic Textual Entailment Corpus | CC BY-NC-SA 4.0 | # | | :heavy_check_mark: | ja.text8 corpus | CC BY-SA | # |

Multilingual

Currently not supported.

Download text file from Aozora Bunko

if (!dir.exists("cache")) dir.create("cache")

text <- ldccr::AozoraBunkoSnapshot |>
  dplyr::sample_n(1L) |>
  dplyr::pull("テキストファイルURL") |>
  ldccr::read_aozora(directory = "cache") |>
  readr::read_lines()

dplyr::glimpse(text)

License

MIT license.



paithiov909/ldccr documentation built on Oct. 14, 2024, 3:44 a.m.