knitr::opts_chunk$set(echo = TRUE)
Last updated 10 February 2024 \
Follow the instructions here for the GUI method or here for the command line method. See also the GCS concept cheatsheet for an overview of recommended environment variables.
Pass a single-page pdf or image file to Document AI and get the output immediately:
library(daiR) ## Not run: myfile <- "sample.pdf" text <- get_text(dai_sync(myfile))
Requires configuration of googleCloudStorageR
. Send larger batches for offline processing in three steps:
## Not run: library(googleCloudStorageR) library(purrr) my_pdfs <- c("sample1.pdf", "sample2.pdf") map(my_pdfs, ~ gcs_upload(.x, name = basename(.x)))
## Not run: resp <- dai_async(my_pdfs) dai_status(resp) # to check the progress
The output will be delivered to the same bucket as JSON files.
## Not run: # Get a dataframe with the bucket contents contents <- gcs_list_objects() # Get the names of the JSON output files jsons <- grep("*.json", contents$name, value = TRUE) # Download them map(jsons, ~ gcs_get_object(.x, saveToDisk = basename(.x))) # Extract the text from the JSON files and save it as .txt files local_jsons <- basename(jsons) map(local_jsons, ~ get_text(.x, type = "async", save_to_file = TRUE))
Assuming your pdfs were named sample1.pdf
and sample2.pdf
, there will now be two files named sample1-0.txt
and sample2-0.txt
in your working directory.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.