Clinicaltrials.gov ClinicalTrials.gov is a registry and results database of publicly and privately supported clinical studies of human participants conducted around the world. Users can search for information about and results from those trials. This package provides a set of functions to interact with the search and download features. Results are downloaded to temporary directories and returned as R objects.
The package is available on CRAN and can be installed as usual. To install the latest version from github, use
devtools::install_github()
, as follows:
install.packages("devtools") library(devtools) install_github("sachsmc/rclinicaltrials")
The main function is clinicaltrials_search()
. Here's an example of its use:
library(rclinicaltrials) library(ggplot2) library(dplyr) z <- clinicaltrials_search(query = 'lime+disease') str(z)
This gives you basic information about the trials. Before searching or downloading, you can determine how many results will be returned using the clinicaltrials_count()
function:
clinicaltrials_count(query = "myeloma") clinicaltrials_count(query = "29485tksrw@")
The query can be a single string which will be passed to the "search terms" field on clinicaltrials.gov. Terms can be combined using the logical operators AND, OR, and NOT. Advanced searches can be performed by passing a vector of key=value pairs as strings. For example, to search for cancer interventional studies,
clinicaltrials_count(query = c("type=Intr", "cond=cancer"))
The possible advance search terms are included in the advanced_search_terms
data frame which comes with the package. The data frame has the keys, description, and a link to the help webpage which will explain the possible values of the search terms. To open the help page for cond
, for instance, run browseURL(advanced_search_terms["cond", "help"])
.
head(advanced_search_terms)
To download detailed study information, including results, use clinicaltrials_download()
:
y <- clinicaltrials_download(query = 'myeloma', count = 10, include_results = TRUE) str(y)
This returns a list of dataframes that have a common key variable: nct_id
. Optionally, you can get the long text fields and/or study results (if available). Study results are also returned as a list of dataframes, contained within the list.
The data come from a relational database with lots of text fields, so it may take some effort to get the data into a flat format for analysis. For that reason, results come back from the clinicaltrials_download
function as a list of dataframes. Each dataframe has a common key variable: nct_id
. To merge dataframes, use this key. Otherwise, you can analyze the dataframes separately. They are organized into study information, locations, outcomes, interventions, results, and textblocks. Results, where available, is itself a list with three dataframes: participant flow, baseline data, and outcome data.
Results tables are stored in long format, so there are often multiple rows per study, each corresponding to a different group or outcome. Let's look at an example, the cumulative enrollment of men and women in phase III, melanoma, interventional studies over time. We can also pass the query as a list of named items.
melanom <- clinicaltrials_search(query = c("cond=melanoma", "phase=2", "type=Intr", "rslt=With"), count = 1e6) nrow(melanom) table(melanom$status.text) melanom2 <- clinicaltrials_search(query = list(cond = "melanoma", phase = "2", type = "Intr", rslt = "With"), count = 1e6) nrow(melanom)
Now to download the data and summarize it:
melanom_information <- clinicaltrials_download(query = c("cond=melanoma", "phase=2", "type=Intr", "rslt=With"), count = 1e6, include_results = TRUE)
load("melanom_info.RData")
summary(melanom_information$study_results$baseline_data) gend_data <- subset(melanom_information$study_results$baseline_data, title == "Gender" & arm != "Total") gender_counts <- gend_data %>% group_by(nct_id, subtitle) %>% do( data.frame( count = sum(as.numeric(paste(.$value)), na.rm = TRUE) )) dates <- melanom_information$study_information$study_info[, c("nct_id", "start_date")] dates$year <- sapply(strsplit(paste(dates$start_date), " "), function(d) as.numeric(d[2])) counts <- merge(gender_counts, dates, by = "nct_id") cts <- counts %>% group_by(year, subtitle) %>% summarize(count = sum(count)) colnames(cts)[2] <- "Gender" ggplot(cts, aes(x = year, y = cumsum(count), color = Gender)) + geom_line() + geom_point() + labs(title = "Cumulative enrollment into Phase III, \n interventional trials in Melanoma, by gender") + scale_y_continuous("Cumulative Enrollment")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.