pos: part-of-speech tagger
In RcppMeCab: 'rcpp' Wrapper for 'mecab' Library

pos	R Documentation

part-of-speech tagger

Description

pos returns part-of-speech (POS) tagged morpheme of the sentence.

Usage

pos(
  sentence,
  join = TRUE,
  format = c("list", "data.frame"),
  lang = NULL,
  sys_dic = "",
  user_dic = ""
)

Arguments

`sentence`	A character vector of any length. For analyzing multiple sentences, put them in one character vector.
`join`	A bool to decide the output format. The default value is TRUE. If FALSE, the function will return morphemes only, and tags put in the attribute. if `format="data.frame"`, then this will be ignored.
`format`	A data type for the result. The default value is "list". You can set this to "data.frame" to get a result as data frame format.
`lang`	Optional language code (`"ja"`, `"ko"`, or `"zh"`) to select a dictionary installed via `download_dic`. When specified, this overrides `sys_dic`.
`sys_dic`	A location of system MeCab dictionary. The default value is "".
`user_dic`	A location of user-specific MeCab dictionary. The default value is "".

Details

This is a basic function for MeCab part-of-speech tagger. The function gets a character vector of any length and runs a loop inside C++ to provide faster processing.

You can add a user dictionary to user_dic. It should be compiled by mecab-dict-index. You can find an explanation about compiling a user dictionary in the https://github.com/junhewk/RcppMeCab.

You can also set a system dictionary especially if you are using multiple dictionaries (for example, using both IPA and Juman dictionary at the same time in Japanese) in sys_dic. Using options(mecabSysDic=), you can set your preferred system dictionary to the R terminal.

If you want to get a morpheme only, use join = False to put tag names on the attribute. Basically, the function will return a list of character vectors with (morpheme)/(tag) elements.

Value

A string vector or a list of POS tagged morpheme will be returned in conjoined character vector form.

Examples

## Not run: 
sentence <- c(#some UTF-8 texts)
pos(sentence)
pos(sentence, join = FALSE)
pos(sentence, format = "data.frame")
pos(sentence, lang = "ja")
pos(sentence, lang = "ko")
pos(sentence, sys_dic = "/path/to/custom/dic")
pos(sentence, user_dic = "/path/to/user.dic")

## End(Not run)

RcppMeCab documentation built on March 24, 2026, 9:08 a.m.