read_tw_csi | R Documentation |
This function reads-in and automatically cleans the Taiwan Chemical Substance Inventory.
read_tw_csi( path, pages = NULL, dpi = 600L, radius = 2L, threshold = 77L, whitelist = "- 0123456789CENP" )
path |
(Character) The path to the PDF file. |
pages |
(Integer) Range of pages to read-in. See
pdf_convert for details. Defaults to |
dpi |
(Integer) Resolution of the PDF image PNG render. See
pdf_convert for details. Defaults to |
radius |
(Integer) Noise reduction radius. See
image_reducenoise for details. Defaults to |
threshold |
(Integer) Black-and-white conversion threshold. See
image_threshold for details. Defaults to |
whitelist |
(Character) Limited allowed character set, optimized to
recognize CAS Registry Numbers. See tesseract for
details. Defaults to |
This function reads-in and automatically cleans the Taiwan Chemical Substance Inventory.
Returns a data frame.
Tested with the 1 January and 8 September 2019 versions.
Raoul Wolf (https://github.com/RaoulWolf/)
## Not run: download.file( url = paste( "https://gazette2.nat.gov.tw/EG_FileManager/eguploadpub/eg021170/ch08", "type3/gov82/num29/images/Eg01.pdf", sep = "/" ), destfile = "Eg01.pdf" )tesseract path <- "Eg01.pdf" tcsi <- read_tw_csi(path) ## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.