nlp_melt_tokens: Tokenize Data Frame by Specified Column(s)
In textpress: A Lightweight and Versatile NLP Toolkit

nlp_melt_tokens

R Documentation

Tokenize Data Frame by Specified Column(s)

Description

This function tokenizes a data frame based on a specified token column and groups the data by one or more specified columns.

Usage

nlp_melt_tokens(
  df,
  melt_col = "token",
  parent_cols = c("doc_id", "sentence_id")
)

Arguments

`df`	A data frame containing the data to be tokenized.
`melt_col`	The name of the column in 'df' that contains the tokens.
`parent_cols`	A character vector indicating the column(s) by which to group the data.

Value

A list of vectors, each containing the tokens of a group defined by the 'by' parameter.

Examples

dtm <- data.frame(doc_id = as.character(c(1, 1, 1, 1, 1, 1, 1, 1)),
                  sentence_id = as.character(c(1, 1, 1, 2, 2, 2, 2, 2)),
                  token = c("Hello", "world", ".", "This", "is", "an", "example", "."))

tokens <- nlp_melt_tokens(dtm, melt_col = 'token', parent_cols = c('doc_id', 'sentence_id'))

textpress documentation built on Oct. 14, 2024, 5:08 p.m.