dai_sync: OCR document synchronously
In daiR: Interface with Google Cloud Document AI API

dai_sync

R Documentation

OCR document synchronously

Description

Sends a single document to the Google Cloud Services (GCS) Document AI v1 API for synchronous (immediate) processing. Returns a HTTP response object containing the OCRed text and additional data.

Usage

dai_sync(
  file,
  proj_id = get_project_id(),
  proc_id = Sys.getenv("DAI_PROCESSOR_ID"),
  proc_v = NA,
  skip_rev = "true",
  loc = "eu",
  token = dai_token()
)

Arguments

`file`	path to a single-page pdf or image file
`proj_id`	a GCS project id.
`proc_id`	a Document AI processor id.
`proc_v`	one of 1) a processor version name, 2) "stable" for the latest processor from the stable channel, or 3) "rc" for the latest processor from the release candidate channel.
`skip_rev`	whether to skip human review; "true" or "false".
`loc`	a two-letter region code; "eu" or "us".
`token`	an authentication token generated by `dai_auth()` or another auth function.

Details

Requires a GCS access token and some configuration of the .Renviron file; see package vignettes for details.Input files can be in either .pdf, .bmp, .gif, .jpeg, .jpg, .png, or .tiff format. PDF files can be up to five pages long. Extract the text from the response object with text_from_dai_response(). Inspect the entire response object with httr::content().

Value

a HTTP response object.

Examples

## Not run: 
response <- dai_sync("doc_page.pdf")

response <- dai_sync("doc_page.pdf",
  proc_v = "pretrained-ocr-v1.1-2022-09-12"
)

## End(Not run)

daiR documentation built on Nov. 18, 2025, 5:06 p.m.