Home

/

GitHub

/

lebebr01/pdfsearch

/

convert_tokens: Ability to tokenize words.

convert_tokens: Ability to tokenize words.
In lebebr01/pdfsearch: Search Tools for PDF Files

View source: R/convert_tokens.r

convert_tokens

R Documentation

Ability to tokenize words.

Description

Ability to tokenize words.

Usage

convert_tokens(
  x,
  path = FALSE,
  split_pdf = FALSE,
  remove_hyphen = TRUE,
  token_function = NULL
)

Arguments

`x`	The text of the pdf file. This can be specified directly or the pdftools package is used to read the pdf file from a file path. To use the pdftools, the path argument must be set to TRUE.
`path`	An optional path designation for the location of the pdf to be converted to text. The pdftools package is used for this conversion.
`split_pdf`	TRUE/FALSE indicating whether to split the pdf using white space. This would be most useful with multicolumn pdf files. The split_pdf function attempts to recreate the column layout of the text into a single column starting with the left column and proceeding to the right.
`remove_hyphen`	TRUE/FALSE indicating whether hyphenated words should be adjusted to combine onto a single line. Default is TRUE.
`token_function`	This is a function from the tokenizers package. Default is the tokenize_words function.

Value

A list of character vectors containing the tokens. More detail can be found looking at the documentation of the tokenizers package.

Examples

 file <- system.file('pdf', '1610.00147.pdf', package = 'pdfsearch')
 convert_tokens(file, path = TRUE)

lebebr01/pdfsearch documentation built on June 14, 2025, 6:52 p.m.

lebebr01/pdfsearch index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

lebebr01/pdfsearch
Search Tools for PDF Files

convert_tokens: Ability to tokenize words.
In lebebr01/pdfsearch: Search Tools for PDF Files

Ability to tokenize words.

Description

Usage

Arguments

Value

Examples

Related to convert_tokens in lebebr01/pdfsearch...

R Package Documentation

Browse R Packages

We want your feedback!

lebebr01/pdfsearch Search Tools for PDF Files

convert_tokens: Ability to tokenize words. In lebebr01/pdfsearch: Search Tools for PDF Files

Ability to tokenize words.

Description

Usage

Arguments

Value

Examples

Related to convert_tokens in lebebr01/pdfsearch...

R Package Documentation

Browse R Packages

We want your feedback!

lebebr01/pdfsearch
Search Tools for PDF Files

convert_tokens: Ability to tokenize words.
In lebebr01/pdfsearch: Search Tools for PDF Files