README.md
In cukarthik/CancerTreatmentCharacterization: What the package does (short line)

Large-Scale Data Analysis to Characterize Variations in Cancer Treatments Across the United States.

Study Status: Started

Analytics use case(s): Characterization
Study type: Clinical Application
Tags: oncology
Study lead: Thomas Falconer and Karthik Natarajan
Study lead forums tag: thomasfalconer
Study start date: October 21, 2021
Study end date: -
Protocol: -
Publications: -
Results explorer: -

Requirements

A database in Common Data Model version 5 in one of these platforms: SQL Server, Oracle, PostgreSQL, IBM Netezza, Apache Impala, Amazon RedShift, Google BigQuery, or Microsoft APS.
R version 4.0 or newer
On Windows: RTools
Java
25 GB of free disk space

See these instructions on how to set up the R environment on Windows.

If you have access to a claims data set please also run this study on it, which is described in the "Run Study on Claims Data" section below

Run Study

In R, use the following code to install the dependencies:

```r install.packages("devtools") library(devtools) install_github("ohdsi/SqlRender") install_github("ohdsi/DatabaseConnector") install_github("ohdsi/OhdsiSharing") install_github("ohdsi/FeatureExtraction") install_github("ohdsi/CohortMethod") install.packages("ggplot2") install.packages("ggrepel") install.packages("dplyr") install.packages("readr") install.packages("sqldf") install.packages("tidyr") install.packages("rmarkdown") install.packages("forcats")

library("SqlRender") library("DatabaseConnector") library("OhdsiSharing") library("FeatureExtraction") library("CohortMethod") library("ggplot2") library("ggrepel") library("dplyr") library("readr") library("sqldf") library("tidyr") ```

If you experience problems on Windows where rJava can't find Java, one solution may be to add `"--no-multiarch"` to each `install_github` call, for example these are two ways to ignore the i386 architecture:

```r
install_github("ohdsi/SqlRender", args = "--no-multiarch")
install_github("ohdsi/SqlRender", INSTALL_opts=c("--no-multiarch"))
```

OR for all installs, one can try:

```r
options(devtools.install.args = "--no-multiarch")
```

Alternatively, ensure that you have installed both 32-bit and 64-bit JDK versions, as mentioned in the [video tutorial](https://youtu.be/K9_0s2Rchbo).

In R, use the following devtools command to install the CancerTreatmentCharacterization package:

```r

install the network package

devtools::install_github("https://github.com/cukarthik/nci-characterization") ``` Alternatively, you can download the repo and build it locally in RStudio (Menu Bar: "Build" -> "Install and Restart")
Once installed, you can execute the study by modifying and using the code below. For your convenience, this code is also provided under extras/CodeToRun.R:

```r library(CancerTreatmentCharacterization)

path <- 's:/CancerTreatmentCharacterization'

Optional: specify where the temporary files will be created:

options(andromedaTempFolder = file.path(path, "andromedaTemp"))

Maximum number of cores to be used:

maxCores <- parallel::detectCores()

Minimum cell count when exporting data:

minCellCount <- 10

The folder where the study intermediate and result files will be written:

outputFolder <- "c:/CancerTreatmentCharacterization"

Details for connecting to the server:

See ?DatabaseConnector::createConnectionDetails for help

connectionDetails <- DatabaseConnector::createConnectionDetails(dbms = "postgresql", server = "some.server.com/ohdsi", user = "", password = "")

The name of the database schema where the CDM data can be found:

cdmDatabaseSchema <- "cdm_synpuf" vocabularyDatabaseSchema <- "cdm_synpuf" #schema where your CDM vocabulary is located

The name of the database schema and table where the study-specific cohorts will be instantiated:

cohortDatabaseSchema <- "scratch.dbo" #You mush have rights to create tables in this schema resultsDatabaseSchema <- "scratch.dbo" #You mush have rights to create tables in this schema cohortTable <- "cancer_cohorts" #Table where the person_id for the cohorts are stored

Some meta-information that will be used by the export function:

databaseId <- "" #SiteName databaseName <- "" #SiteName_DatabaseName databaseDescription <- "" #Description of site's database

For Oracle: define a schema that can be used to emulate temp tables:

oracleTempSchema <- NULL

execute(connectionDetails, cdmDatabaseSchema, cohortDatabaseSchema = cohortDatabaseSchema, cohortTable = cohortTable, oracleTempSchema = cohortDatabaseSchema, outputFolder, databaseId = databaseId, databaseName = databaseName, databaseDescription = databaseDescription, reloadData = TRUE, #The flag lets the user reload csv data files into the resultsDatabaseSchema. #Note: the first time running the package, this flag should be set to TRUE
```
    createCohorts = TRUE,                   #The flag creates the cohorts. One can set it to FALSE after the first time the cohorts are created.
    runAnalyses = TRUE,                     #This flag runs the analysis. NOTE: The subsequent flags enable or disable parts of the analysis.
    buildDataSet = TRUE,                      #This flag builds the data sets used for the analysis
    runOhdsiCharacterization = TRUE,          #This flag runs the OHDSI characterization package on the cohorts to get a Table1.
    runTreatmentAnalysis = TRUE,              #This flag is the main analysis that characterizes treatment variation
    runDiagnostics = FALSE,                   #This flag runs OHDSI's CohortDiagnostics on the cohorts created
    runADIAnalysis = FALSE,                   #This flag run ADI analysis. NOTE: only set this to true if your database has geocoded data
    packageResults = FALSE,
    renderMarkdown = TRUE,                    #This flag runs the treatment analysis within a RMarkdown script for each cancer and outputs the html version of the executed RMarkdown file. 
                                              # If the variable is set to FALSE, then it executes a regular R script
    maxCores = maxCores,
    minCellCount = minCellCount)
```
```
To view the results, one can go to the specified output folder. There you will see a folder for each cancer (breast, prostate, lung and multiple myeloma). Within each folder, there is a data and plots folder. The data folder contains aggregate counts that were used to generate the plots.
Please contact both Karthik Natarajan (kn2174 at cumc dot columbia dot edu) and Thomas Falconer (tf2428 at cumc dot columbia dot edu) after the study execution or if there are any issues that arise. Currently, there is no automated method to submit the results. The plot folders will be need to be manually zipped. We will setup a meeting to review the results.

Development

CancerTreatmentCharacterization is a custom study package that was developed in R Studio.

License

The CancerTreatmentCharacterization package is licensed under Apache License 2.0

cukarthik/CancerTreatmentCharacterization documentation built on Dec. 19, 2021, 7:03 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

cukarthik/CancerTreatmentCharacterization
What the package does (short line)

README.md
In cukarthik/CancerTreatmentCharacterization: What the package does (short line)

Large-Scale Data Analysis to Characterize Variations in Cancer Treatments Across the United States.

Requirements

Run Study

install the network package

Optional: specify where the temporary files will be created:

Maximum number of cores to be used:

Minimum cell count when exporting data:

The folder where the study intermediate and result files will be written:

Details for connecting to the server:

See ?DatabaseConnector::createConnectionDetails for help

The name of the database schema where the CDM data can be found:

The name of the database schema and table where the study-specific cohorts will be instantiated:

Some meta-information that will be used by the export function:

For Oracle: define a schema that can be used to emulate temp tables:

Development

License

R Package Documentation

Browse R Packages

We want your feedback!

cukarthik/CancerTreatmentCharacterization What the package does (short line)

README.md In cukarthik/CancerTreatmentCharacterization: What the package does (short line)

Large-Scale Data Analysis to Characterize Variations in Cancer Treatments Across the United States.

Requirements

Run Study

install the network package

Optional: specify where the temporary files will be created:

Maximum number of cores to be used:

Minimum cell count when exporting data:

The folder where the study intermediate and result files will be written:

Details for connecting to the server:

See ?DatabaseConnector::createConnectionDetails for help

The name of the database schema where the CDM data can be found:

The name of the database schema and table where the study-specific cohorts will be instantiated:

Some meta-information that will be used by the export function:

For Oracle: define a schema that can be used to emulate temp tables:

Development

License

R Package Documentation

Browse R Packages

We want your feedback!

cukarthik/CancerTreatmentCharacterization
What the package does (short line)

README.md
In cukarthik/CancerTreatmentCharacterization: What the package does (short line)