form: Create a list of tokens

View source: R/as_tokens.R

formR Documentation

Create a list of tokens

Description

This function is a shorthand of tokenize_to_df() |> as_tokens().

Usage

form(
  x,
  text_field = "text",
  docid_field = "doc_id",
  instance = rebuild_tokenizer(),
  ...
)

Arguments

x

A data.frame like object or a character vector to be tokenized.

text_field

Column name where to get texts to be tokenized.

docid_field

Column name where to get identifiers of texts.

instance

A binding to the instance of <sudachipy.tokenizer.Tokenizer>. If you already have a tokenizer instance, you can improve performance by providing a predefined instance.

...

Passed to as_tokens().

Value

A named list of character vectors.

Examples

## Not run: 
form(
  "Tokyo, Japan",
  type = "surface"
)

## End(Not run)

uribo/sudachir documentation built on Feb. 7, 2023, 11:09 a.m.