knitr::opts_chunk$set( eval = FALSE)
Remember to respect the registers' terms and conditions (see
ctrOpenSearchPagesInBrowser(copyright = TRUE)). Please cite this package in any publication as follows: Ralf Herold (2020). ctrdata: Retrieve and Analyze Clinical Trials in Public Registers. R package version 1.4, https://cran.r-project.org/package=ctrdata
These functions open the browser, where the user can start searching for trials of interest.
# Please review and respect register copyrights: ctrOpenSearchPagesInBrowser( copyright = TRUE ) # Open browser with example search: ctrOpenSearchPagesInBrowser( url = "cancer&age=under-18", register = "EUCTR" )
Refine the search until the trials of interest are listed in the browser. The total number of trials that can be retrieved with package
ctrdata is intentionally limited to queries with at most 10000 result records.
Use functions or keyboard shortcuts according to the operating system.
The next steps are executed in the R environment:
q <- "https://www.clinicaltrialsregister.eu/ctr-search/search?query=cancer&age=under-18&status=completed&phase=phase-one"
q <- ctrGetQueryUrl() # Found search query from EUCTR. q # query-term query-register # 1 query=cancer&age=under-18&status=completed&phase=phase-one EUCTR # To check, a browser with this query # is opened with this command ctrOpenSearchPagesInBrowser( url = q )
# Connect to a database and chose a table / collection db <- nodbi::src_sqlite( dbname = "sqlite_file.sql", collection = "test" ) # Count number of trial records ctrLoadQueryIntoDb( queryterm = q, only.count = TRUE, con = db )$n #  63 # Retrieve records, download into database ctrLoadQueryIntoDb( queryterm = q, con = db ) # * Found search query from EUCTR: query=cancer&age=under-18&status=completed&phase=phase-one # (1/3) Checking trials in EUCTR: # Retrieved overview, multiple records of 64 trial(s) from 4 page(s) to be downloaded. # Checking helper binaries: done. # Downloading trials (4 pages in parallel)... # Note: register server cannot compress data, transfer takes longer, about 0.4s per trial # Pages: 4 done, 0 ongoing # (2/3) Converting to JSON, 241 records converted # (3/3) Importing JSON records into database... # = Imported or updated 241 records on 64 trial(s). # * Updated history in meta-info of "test" # Show which queries have been downloaded into database dbQueryHistory(con = db) # query-timestamp query-register query-records # 1 2021-05-08 10:56:50 EUCTR 241 # query-term # 1 query=cancer&age=under-18&status=completed&phase=phase-one
With file-base SQLite, it takes about 5 minutes for 1000 records.
Speed is higher when using MongoDB (or memory-based SQLite).
ctrLoadQueryIntoDb( querytoupdate = "last", con = db )
Instead of "last", an integer number can be specified for
querytoupdate that corresponds to the number when using
Depending on the register, an update (differential update) is possible or the original query is executed fully again.
For EUCTR, result-related trial information has to be requested to be retrieved, because it will take longer to download and store. For CTGOV, any results are always included in the retrieval.
ctrLoadQueryIntoDb( queryterm = q, euctrresults = TRUE, con = db ) # [...] # = Imported or updated 241 records on 64 trial(s). # * Retrieving results if available from EUCTR for 64 trials: # (1/4) Downloading results (max. 10 trials in parallel): # 64 downloaded, extracting x x x x x x x . x x PDF . . . x x . . x x x . PDF . . . . . . PDF . PDF . x PDF . . . x . x . x . . . . x . PDF . . . . . . . . . . . . x x x . . . . . # (2/3) Converting to JSON, 42 records converted # (3/4) Importing JSON into database... # (4/4) Results history: not retrieved (euctrresultshistory = FALSE). # = Imported or updated results for 42 trials. # * Updated history in meta-info of "test"
The download or presence of results is not recorded in
dbQueryHistory() because the availability of results increases over time.
The same database and table / collection can be used to store (and analyse) trial information from different registers. At the moment,
ctrdata only supports the two registers https://ClinicalTrials.Gov/ and https://ClinicalTrialsRegister.EU/. Example:
ctrLoadQueryIntoDb( queryterm = "https://clinicaltrials.gov/ct2/results?cond=neuroblastoma&recrs=e&age=0&intr=Drug", con = db ) # * Found search query from CTGOV: cond=neuroblastoma&recrs=e&age=0&intr=Drug # (1/3) Checking trials in CTGOV: # Retrieved overview, records of 193 trial(s) are to be downloaded. # Checking helper binaries: done. # Downloading: 1.2 MB # (2/3) Converting to JSON, 193 records converted # (3/3) Importing JSON records into database... # = Imported or updated 193 trial(s). # * Updated history in meta-info of "test"
When downloading trial information, the user can specify an annotation to all records that are downloaded. By default, annotations are accumulated if trial records are loaded again or updated; alternatively, annotations can be replaced.
Annotations are useful for analyses, for example to specially identify subsets of records in the database.
ctrLoadQueryIntoDb( queryterm = "https://clinicaltrials.gov/ct2/results?cond=neuroblastoma&recrs=e&age=0&intr=Drug&cntry=DE", annotation.text = "site_DE ", annotation.mode = "append", con = db ) # [...] # = Annotated retrieved records # = Imported or updated 11 trial(s). # * Updated history in meta-info of "test"
Not all registers automatically expand search terms to include alternative terms, such as codes and other names of active substances. To obtain a character vector of synonyms for any active substance name, use:
ctrFindActiveSubstanceSynonyms( activesubstance = "imatinib" ) #  "imatinib" "gleevec" "sti 571" "glivec" "CGP 57148" "st1571"
These names can then be used in queries in any register.
# cleanup unlink("sqlite_file.sql")
This example works with a free service here. Note that the user name and password need to be encoded. The format of the connection string is documented at https://docs.mongodb.com/manual/reference/connection-string/.
# Specify base uri for remote mongodb server, # as part of the encoded connection string db <- nodbi::src_mongo( # Note: this provides a read-only access url = "mongodb+srv://DWbJ7Wh:bdTHh5cS@cluster0-b9wpw.mongodb.net", db = "dbperm", collection = "dbperm") # Since the above access is read-only, # just obtain fields of interest: dbGetFieldsIntoDf( fields = c("a2_eudract_number", "e71_human_pharmacology_phase_i"), con = db) # _id a2_eudract_number e71_human_pharmacology_phase_i # 1 2010-024264-18-3RD 2010-024264-18 TRUE # 2 2010-024264-18-AT 2010-024264-18 TRUE # 3 2010-024264-18-DE 2010-024264-18 TRUE # 4 2010-024264-18-GB 2010-024264-18 TRUE # 5 2010-024264-18-IT 2010-024264-18 TRUE # 6 2010-024264-18-NL 2010-024264-18 TRUE
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.