get_ncit: Scrape the NCI Thesaurus

Description Usage Arguments Drug Detail Links See Also

Description

All NCIt Codes that have not been scraped or were scraped in the expiration period are scraped in the NCIt Thesaurus at the "https://ncithesaurus.nci.nih.gov/ncitbrowser/pages/concept_details.jsf?dictionary=NCI_Thesaurus&code=%s&ns=ncit&type=synonym&key=null&b=1&n=0&vse=null# path.

Usage

1
2
3
4
5
6
7
get_ncit(
  conn,
  sleep_time = 5,
  expiration_days = 100,
  verbose = TRUE,
  render_sql = TRUE
)

Arguments

conn

Postgres connection object.

sleep_time

Time in seconds for the system to sleep before each scrape with read_html.

verbose

When reading from a slow connection, this prints some output on every iteration so you know its working.

Drug Detail Links

The links to Drug Pages are scraped from the Data Dictionary URL over the maximum page number and are saved to a Drug Link Table in the cancergov schema. The URLs in the Drug Link Table are then scraped for any HTML Tables of synonyms and the results are written to a Drug Link Synonym Table. The links to active clinical trials and NCIt mappings are also derived and stored in their respective tables.

See Also

query,appendTable render typewrite_progress,c("typewrite", "typewrite"),character(0) html_nodes,html_table keep bind,mutate,select_all format_colnames


meerapatelmd/skyscraper documentation built on Dec. 27, 2020, 7:46 a.m.