tokenize_to_df: Create a data.frame of tokens

View source: R/tokenize.R

tokenize_to_dfR Documentation

Create a data.frame of tokens

Description

Create a data.frame of tokens

Usage

tokenize_to_df(
  x,
  text_field = "text",
  docid_field = "doc_id",
  into = dict_features(),
  col_select = seq_along(into),
  instance = rebuild_tokenizer(),
  ...
)

Arguments

x

A data.frame like object or a character vector to be tokenized.

text_field

Column name where to get texts to be tokenized.

docid_field

Column name where to get identifiers of texts.

into

Column names of features.

col_select

Character or integer vector of column names that kept in the return value. When passed as NULL, returns comma-separated features as is.

instance

A binding to the instance of <sudachipy.tokenizer.Tokenizer>. If you already have a tokenizer instance, you can improve performance by providing a predefined instance.

...

Currently not used.

Value

A tibble.

Examples

## Not run: 
tokenize_to_df(
  "Tokyo, Japan",
  into = dict_features("en"),
  col_select = c("pos1", "pos2")
)

## End(Not run)

uribo/sudachir documentation built on Feb. 7, 2023, 11:09 a.m.