get_rn_url_validity: Check that the Registry Number URL is Valid

Description Usage Arguments Value RN URL Validity Table See Also

View source: R/deprecated/get_rn_url_validity.R

Description

All ChemiDPlus Scraping Functions operate on a Registry Number URL (rn_url). The initial search is logged to a "REGISTRY_NUMBER_LOG" Table. If the RN URL is then tested for 404 Status and logged to the "RN_URL_VALIDITY" Table. The major sections found at the ChemiDPlus site are: "Names and Synonyms", "Classification", "Registry Numbers", "Links to Resources" with these sections are written to their respective tables "NAMES_AND_SYNONYMS", "CLASSIFICATION", "REGISTRY_NUMBERS", and "LINKS_TO_RESOURCES".

Usage

1
2
3
4
5
6
7
8
get_rn_url_validity(
  conn,
  rn_url,
  response,
  schema = "chemidplus",
  sleep_time = 3,
  verbose = TRUE
)

Arguments

conn

Postgres connection object

rn_url

Registry number URL to read that also serves as an Identifier

response

(optional) "xml_document" "xml_node" class object returned by xml2::read_html for the rn_url argument. Providing a response from a single HTML read reduces the chance of encountering a HTTP 503 error when parsing multiple sections from a single URL. If a response argument is missing, a response is read. Followed by the sleep_time in seconds.

schema

Schema that the returned data is written to, Default: 'chemidplus'

sleep_time

If the response argument is missing, the number seconds to pause after reading the URL, Default: 3

Value

Each section is parsed by a respective skyscraper function that stores the scraped results in a table of the same name in a schema. If a connection argument is not provided, the results are returned as a dataframe in the R console.

RN URL Validity Table

The RN_URL_VALIDITY Table logs whether a HTTP 404 Error was recorded for a RN URL found in the REGISTRY_NUMBER_LOG Table for QA purposes.

See Also

lsSchema,createSchema,lsTables,query,buildQuery,appendTable,writeTable tibble

Other chemidplus scraping: get_classification(), get_links_to_resources(), get_names_and_synonyms(), get_registry_numbers(), log_registry_number()


meerapatelmd/skyscraper documentation built on Dec. 27, 2020, 7:46 a.m.