read_tw_csi: Read-In and Clean the Taiwan Chemical Substance Inventory

View source: R/read_tw_csi.R

read_tw_csiR Documentation

Read-In and Clean the Taiwan Chemical Substance Inventory

Description

This function reads-in and automatically cleans the Taiwan Chemical Substance Inventory.

Usage

read_tw_csi(
  path,
  pages = NULL,
  dpi = 600L,
  radius = 2L,
  threshold = 77L,
  whitelist = "- 0123456789CENP"
)

Arguments

path

(Character) The path to the PDF file.

pages

(Integer) Range of pages to read-in. See pdf_convert for details. Defaults to NULL, i.e., all pages.

dpi

(Integer) Resolution of the PDF image PNG render. See pdf_convert for details. Defaults to 600L.

radius

(Integer) Noise reduction radius. See image_reducenoise for details. Defaults to 2L.

threshold

(Integer) Black-and-white conversion threshold. See image_threshold for details. Defaults to 77L.

whitelist

(Character) Limited allowed character set, optimized to recognize CAS Registry Numbers. See tesseract for details. Defaults to "- 0123456789CENP".

Details

This function reads-in and automatically cleans the Taiwan Chemical Substance Inventory.

Value

Returns a data frame.

Note

Tested with the 1 January and 8 September 2019 versions.

Author(s)

Raoul Wolf (https://github.com/RaoulWolf/)

Examples

## Not run: 
download.file(
  url = paste(
    "https://gazette2.nat.gov.tw/EG_FileManager/eguploadpub/eg021170/ch08",
    "type3/gov82/num29/images/Eg01.pdf",
    sep = "/"
  ),
  destfile = "Eg01.pdf"
)tesseract

path <- "Eg01.pdf"

tcsi <- read_tw_csi(path)

## End(Not run)

RaoulWolf/cleanventory documentation built on Sept. 15, 2022, 4:25 a.m.