lex_density: Calculate lexical density
In paithiov909/audubon: Japanese Text Processing Tools

lex_density

R Documentation

Calculate lexical density

Description

The lexical density is the proportion of content words (lexical items) in documents. This function is a simple helper for calculating the lexical density of given datasets.

Usage

lex_density(vec, contents_words, targets = NULL, negate = c(FALSE, FALSE))

Arguments

`vec`	A character vector.
`contents_words`	A character vector containing values to be counted as contents words.
`targets`	A character vector with which the denominator of lexical density is filtered before computing values.
`negate`	A logical vector of which length is 2. If passed as `TRUE`, then respectively negates the predicate functions for counting contents words or targets.

Value

A numeric vector.

Examples

head(hiroba) |>
  prettify(col_select = "POS1") |>
  dplyr::group_by(doc_id) |>
  dplyr::summarise(
    noun_ratio = lex_density(POS1,
      "\u540d\u8a5e",
      c("\u52a9\u8a5e", "\u52a9\u52d5\u8a5e"),
      negate = c(FALSE, TRUE)
    ),
    mvr = lex_density(
      POS1,
      c("\u5f62\u5bb9\u8a5e", "\u526f\u8a5e", "\u9023\u4f53\u8a5e"),
      "\u52d5\u8a5e"
    ),
    vnr = lex_density(POS1, "\u52d5\u8a5e", "\u540d\u8a5e")
  )

paithiov909/audubon documentation built on June 2, 2025, 1:15 a.m.