build_token_df: Build token dataframe
In daiR: Interface with Google Cloud Document AI API

build_token_df

R Documentation

Build token dataframe

Description

Builds a token dataframe from the text OCRed by Document AI (DAI) in an asynchronous request. Rows are tokens, in the order DAI proposes to read them. Columns are location variables such as page coordinates and block bounding box numbers.

Usage

build_token_df(object, type = "sync")

Arguments

`object`	either a HTTP response object from `dai_sync()` or the path to a JSON file from `dai_async()`.
`type`	one of "sync" or "async" depending on the function used to process the original document.

Details

The location variables are: token, start index, end index, confidence, left boundary, right boundary, top boundary, bottom boundary, page number, and block number. Start and end indices refer to character position in the string containing the full text.

Value

a token data frame

Examples

## Not run: 
resp <- dai_sync("file.pdf")
token_df <- build_token_df(resp)

token_df <- build_token_df("pdf_output.json", type = "async")

## End(Not run)

daiR documentation built on Nov. 18, 2025, 5:06 p.m.