kanjidata | R Documentation |
The tibbles kbase and kmorph provide basic and morphologic information, respectively, for all kanji contained in the KANJIDIC2 file (see below)
kbase
kmorph
kbase is a tibble with 13,108 rows and 13 variables:
the kanji
the Unicode codepoint
the number of strokes
one of four classes: "kyouiku", "jouyou", "jinmeiyou" or "hyougai"
a number from 1-11, basically a finer version of class, same as in KANJIDIC2, except that we assgined an 11 for all hyougaiji (rather than an NA value)
at what level the kanji appears in the Nihon Kanji Nouryoku Kentei (Kanken)
at what level the kanji appears in the Japanese Language Proficiency Test (Nihongou Nouryoku Shiken)
at what level the kanji is learned on the kanji learning website Wanikani
the frequency rank (1 = most frequent) "based on several averages (Wikipedia, novels, newspapers, ...)"
the frequency rank (1 = most frequent) based on news paper data (2501 most frequent kanji over four years in the Mainichi Shimbun)
a single ON reading in katakana
a single kun reading in hiragana
a single English meaning of the kanji
kmorph is a tibble with 13,108 rows and 15 variables:
the kanji
the number of strokes
the traditional (Kangxi) radical used for indexing kanji (one of 214)
the variant of the radical if it is different, otherwise NA
the Nelson radical if it differs from the traditional one, otherwise NA
ideographic description character (plus sometimes a number or a letter) describing the shape of the kanji
visible components of the kanji; originally from KRADFILE
the kanji's SKIP code
a single English meaning of the kanji (same as in kbase)
The single ON and kun readings and the single meaning are for easy identification
of the more difficult kanji. They are the first entry in the KANJIDIC2 file which may not
always be the most important one. For full readings/meanings use the function lookup
or consult a dictionary.
Most of the data is directly from the KANJIDIC2 file.
https://www.edrdg.org/wiki/index.php/KANJIDIC_Project
Variables jlpt
, frank
, idc
, components
were taken from the Kanjium data base
https://github.com/mifunetoshiro/kanjium
Variable components
is originally from RADKFILE/KRADFILE.
https://www.edrdg.org/)
The use of this data is covered in each case by a Creative Commons BY-SA 4.0 License. See the package's LICENSE file for details and copyright holders.
Variable "class" is derived from "grade".
Variable "kanken" was compiled based on the Wikipedia description of the test levels (as of September 2022).
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.