README.md

cleanventory

R-CMD-check Codecov test
coverage Lifecycle:
experimental

A ZeroPM R package

The goal of cleanventory is to provide simple functionality to clean and partially curate data sets of common chemical inventories. The aim is to document every step, from the raw (downloaded) files to the final tables.

cleanventory aims to correctly identify all missing values in data sets, validates CAS Registry Numbers (when present) and additionally offers functionality to transform all special characters into ASCII characters.

The dependencies of cleanventory are kept at as minimal as possible: openxlsx for handling .xlsx files, and the trio of pdftools, magick and tesseract to extract data from (image) .pdf files.

We suggest the following packages/functionalities in addition: bit64::as.integer64() to correctly handle the us_tsca$cas_reg_no and us_cdr$chemical_id_wo_dashes columns (kept as double for compatibility).

As of 2022-08-02, the following inventories are included:

| Chemical Inventory | Function | Compatible Version(s) | URL | |:-------------------|:------------------|:---------------------------------------------|:--------------------------------------------------------------------------------------------------------------------------------------------------------| | US TSCA | read_us_tsca() | 2021-08 | https://www.epa.gov/tsca-inventory | | EU CLP Annex VI | read_eu_clp() | 9, 10, 13, 14, 15, 17 | https://echa.europa.eu/en/information-on-chemicals/annex-vi-to-clp | | EU ECI | read_eu_eci() | Unknown | https://echa.europa.eu/information-on-chemicals/ec-inventory | | Japan NITE | read_jp_nite() | March 2022 | https://www.nite.go.jp/chem/english/ghs/ghs_download.html | | New Zealand IoC | read_nz_ioc() | December 2021 | https://www.epa.govt.nz/database-search/new-zealand-inventory-of-chemicals-nzioc/ | | South Korea NCIS | read_kr_ncis() | 4 May 2022 | https://ncis.nier.go.kr/en/mttrList.do | | Australia HCIS | read_au_hcis() | Unknown | http://hcis.safeworkaustralia.gov.au/HazardousChemical | | Australia ICI | read_au_ici() | 10 February 2022 | https://www.industrialchemicals.gov.au/search-inventory | | Taiwan CSI | read_tw_csi() | Unknown | https://gazette.nat.gov.tw/egFront/detail.do?metaid=73440&log=detailLoghttps://gazette.nat.gov.tw/egFront/detail.do?metaid=78617&log=detailLog | | Philippine ICCS | read_ph_iccs() | 2017, 2020, 2021 | https://chemical.emb.gov.ph/?page_id=138 | | Japan CSCL | read_jp_cscl() | 31 May 202231 May 20221 April 2022 | https://www.nite.go.jp/en/chem/chrip/chrip_search/sltLst | | Canada DSL | read_ca_dsl() | 14 June 2022 | https://pollution-waste.canada.ca/substances-search/Substance?lang=en | | China IECSC | read_cn_iecsc() | 2013 | https://www.mee.gov.cn/gkml/hbb/bgg/201301/t20130131_245810.htm | | Nordics SPIN | read_xn_spin() | 2000 | http://www.spin2000.net/spinmyphp/ | | US CDR | read_us_cdr() | 20162020 | https://www.epa.gov/chemical-data-reporting | | Malaysia CIMS | read_my_cims() | 2017 | https://cims.dosh.gov.my/ |

Installation

You can install the development version of cleanventory from GitHub with:

# install.packages("devtools")
remotes::install_github("ZeroPM-H2020/cleanventory")

Examples

This is an example which shows you how to get the data set of the (current) EU CLP Annex VI:

library(cleanventory)

tmp <- tempdir()

url <- paste0(
  "https://echa.europa.eu/documents/10162/17218/",
  "annex_vi_clp_table_atp17_en.xlsx/",
  "4dcec79c-f277-ed68-5e1b-d435900dbe34?t=1638888918944"
)

eu_clp_file <- download.file(
  url, 
  destfile = paste(tmp, "annex_vi_clp_table_atp17_en.xlsx", sep = "/"),
  quiet = TRUE,
  mode = ifelse(.Platform$OS.type == "windows", "wb", "w")
)

path <- paste(tmp, "annex_vi_clp_table_atp17_en.xlsx", sep = "/")

eu_clp <- read_eu_clp(path)

invisible(file.remove(path))

head(eu_clp)
#>       index_no international_chemical_identification     ec_no     cas_no
#> 1 001-001-00-9                              hydrogen 215-605-7  1333-74-0
#> 2 001-002-00-4             aluminium lithium hydride 240-877-9 16853-85-3
#> 3 001-003-00-X                        sodium hydride 231-587-3  7646-69-7
#> 4 001-004-00-5                       calcium hydride 232-189-2  7789-78-8
#> 5 003-001-00-4                               lithium 231-102-5  7439-93-2
#> 6 003-002-00-X                        n-hexyllithium 404-950-0 21369-64-2

str(eu_clp)
#> 'data.frame':    4702 obs. of  4 variables:
#>  $ index_no                             : chr  "001-001-00-9" "001-002-00-4" "001-003-00-X" "001-004-00-5" ...
#>  $ international_chemical_identification: chr  "hydrogen" "aluminium lithium hydride" "sodium hydride" "calcium hydride" ...
#>  $ ec_no                                : chr  "215-605-7" "240-877-9" "231-587-3" "232-189-2" ...
#>  $ cas_no                               : chr  "1333-74-0" "16853-85-3" "7646-69-7" "7789-78-8" ...

Acknowledgement

This R package was developed at the Norwegian Geotechnical Institute (NGI) as part of the project ZeroPM: Zero pollution of Persistent, Mobile substances. This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 101036756.

If you find this package useful and can afford it, please consider making a donation to a humanitarian non-profit organization, such as Sea-Watch. Thank you.



RaoulWolf/cleanventory documentation built on Sept. 15, 2022, 4:25 a.m.