README.md

ctregistries

In Development

The goal of ctregistries is to facilitate the detection and analysis of clinical trial registration numbers. ctregistries is primarily a data package of regular expressions (regexes) and provides some R functions for implementing the regexes.

Regular expressions were developed for trial registration numbers (TRN) from World Health Organization (WHO) International Clinical Trials Registry Platform (ICTRP) Primary Registries (https://www.who.int/ictrp/network/primary/en/) and MEDLINE Databank Sources (https://www.nlm.nih.gov/bsd/medline_databank_source.html).

Additional, non-trial databanks indexed by MEDLINE (e.g., figshare) are also included, without regexes.

Installation

You can install the development version of ctregistries from GitHub:

# install.packages("devtools")
devtools::install_github("maia-sh/ctregistries")

Dataset

ctregistries provides the registries dataframe with regular expressions for each registry. The registries dataframe is a subset of the larger databanks dataframe which additionally includes non-trial databanks indexed by MEDLINE (e.g., figshare) without regexes. registries is created by filtering databanks for databank_type == "registry.

library(ctregistries)

head(registries) %>% knitr::kable()

| registry | databank_full_name | medline_start_date | databank_type | trn_regex | medline_si | who_ictrp_primary_registry | registry_website | | :----------------- | :------------------------------------------------------- | -------------------: | :------------- | :----------------- | :---------- | :---------------------------- | :----------------------------------------------------------- | | ANZCTR | Australian New Zealand Clinical Trials Registry | 2014 | registry | (?i)(ACTRN|ANZCTR) | TRUE | TRUE | https://www.anzctr.org.au/ | | ChiCTR | Chinese Clinical Trials Registry | 2014 | registry | (?i)ChiCTR(|) | TRUE | TRUE | http://www.chictr.org.cn/ | | CRiS | Clinical Research Information Service, Republic of Korea | 2014 | registry | (?i)KCT | TRUE | TRUE | http://cris.nih.go.kr/cris/en/use_guide/cris_introduce.jsp | | ClinicalTrials.gov | ClinicalTrials.gov Database (NIH/NLM) | 2005 | registry | (?i)NCT | TRUE | FALSE | https://clinicaltrials.gov/ | | CTRI | Clinical Trials Registry - India | 2014 | registry | (?i)CTRI/// | TRUE | TRUE | http://ctri.nic.in/ | | DRKS | German Clinical Trials Register | 2014 | registry | (?i)DRKS | TRUE | TRUE | http://www.germanctr.de/ |


databanks$databank
#>  [1] "ANZCTR"             "ChiCTR"             "CRiS"              
#>  [4] "ClinicalTrials.gov" "CTRI"               "DRKS"              
#>  [7] "EudraCT"            "IRCT"               "ISRCTN"            
#> [10] "JapicCTI"           "JMACCT"             "JPRN"              
#> [13] "jRCT"               "LBCTR"              "NTR"               
#> [16] "PACTR"              "ReBec"              "REPEC"             
#> [19] "RPCEC"              "SLCTR"              "TCTR"              
#> [22] "UMIN-CTR"           "BioProject"         "dbGaP"             
#> [25] "dbSNP"              "dbVar"              "Dryad"             
#> [28] "figshare"           "GDB"                "GENBANK"           
#> [31] "GEO"                "OMIM"               "PDB"               
#> [34] "PIR"                "PubChem-BioAssay"   "PubChem-Compound"  
#> [37] "PubChem-Substance"  "RefSeq"             "SRA"               
#> [40] "SWISSPROT"          "UniMES"             "UniParc"           
#> [43] "UniProtKB"          "UniRef"

Functions

ctregistries provides some functions implementing the registries dataset to detect trial registration numbers and registries in both vectors and dataframes.

library(ctregistries)

# Check whether there is a TRN
has_trn(c("NCT00312962", "hello", "euctr2020-001808-42", NA))
#> [1]  TRUE FALSE  TRUE    NA

# Extract the TRNs
which_trn("NCT00312962 and euctr2020-001808-42")
#> [1] "NCT00312962"    "2020-001808-42"
which_trns(c("NCT00312962", "hello", "euctr2020-001808-42", NA))
#> [1] "NCT00312962"    NA               "2020-001808-42" NA

# Identify the registry
which_registry("NCT00312962 and euctr2020-001808-42")
#> [1] "ClinicalTrials.gov" "EudraCT"
which_registries(c("NCT00312962", "hello", "euctr2020-001808-42", NA))
#> [1] "ClinicalTrials.gov" NA                   "EudraCT"           
#> [4] NA

# Add the trn and registry to a dataframe
mutate_trn_registry(sample_trn_df, text)
#> # A tibble: 7 x 5
#>      id text                               registry_guess   registry    trn     
#>   <dbl> <chr>                              <chr>            <chr>       <chr>   
#> 1     1 NCT00312962                        clinicaltrials.… ClinicalTr… NCT0031…
#> 2     2 hello                              <NA>             <NA>        <NA>    
#> 3     3 <NA>                               ChiCTR           <NA>        <NA>    
#> 4     4 euctr2020-001808-42                EudraCT          EudraCT     2020-00…
#> 5     5 German Clinical Trial Registry Id… DRKS             DRKS        DRKS000…
#> 6     6 ClinicalTrials.gov number, NCT002… ISRCTN           ClinicalTr… NCT0026…
#> 7     6 ClinicalTrials.gov number, NCT002… ISRCTN           ISRCTN      ISRCTN7…

To Do



maia-sh/ctregistries documentation built on July 20, 2024, 8:08 p.m.