knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
In this section we will go through a typical analysis workflow using the castle
package:
.csv
-format. The first thing we need to do is load the packages we need for the analysis:
library(castle) library(dplyr)
We start by loading a QuantaSoft dataset in .csv
-format using the included import_QS_files
-function. Here we will use the example data included in the package, where we have both negative samples of different concentrations (training samples) and some test samples available for a KRAS G12D assay. For this assay Ch1 was used for measuring the mutant signal, which is set using the flag Ch1_is_mutation = TRUE
. If Ch2 was used this is simply set to FALSE
. Data is loaded like this:
path_to_my_data <- c("path/to/a/csv_file", "path/to/a/folder/with/csv_files/") my_samples <- import_QS_files(path_to_my_data)
For this example we will load the training samples that are bundled in the castle
-package:
path_to_my_data <- system.file("extdata/example_data/", "background_data_KRAS_G12D.csv", package = "castle") background_samples <- import_QS_files(path_to_my_data, Ch1_is_mutation = TRUE)
The first 5 samples/wells in background_samples
looks like this:
head(background_samples, 5)
Note that besides some metadata (file name, target etc.) for each sample, the dataset contains the columns "WildtypeOnlyDroplets", "MutantOnlyDroplets", "DoubleNegativeDroplets" and "DoublePositiveDroplets". These are the counts of the four different kinds of droplets for each sample based on the two signals in Ch1 and Ch2. The following model training and statistical tests are based solely on these counts.
To train the model we are only interested in the negative samples. Ideally these should cover the range of different concentrations of DNA that the trained model is expected to encounter in future test samples. Before training we start by removing the empty samples ("NTC") and positive controls from the training dataset we loaded above:
# Remove control samples ("NTC" and "Positive Control") clean_background_samples <- background_samples %>% filter( !(Ch1TargetType %in% c("NTC", "Positive Control")), !(Ch2TargetType %in% c("NTC", "Positive Control")) )
The cleaned data is then used for model training using the function train_simple_ddpcr_model
:
trained_model <- train_integrated_ddpcr_model( background_samples = clean_background_samples )
The object trained_model
is simply a list of parameters used in the statistical model in CASTLE.
For testing we will use a variety of samples from two different patients, who have undergone curative surgery for colorectal cancer. We start by loading the data from these patients. To load the data we will again use the function import_QS_files
. But this time, data from some of the samples is distributed across multiple wells. These can be merged together using the flag merge_wells = 'yes'
(see ?import_QS_files
for other methods for merging). Furthermore samples across multiple files can also be merged using the flag merge_files = TRUE
if necessary.
# Path to example data in the package path_to_my_test_samples <- system.file( "extdata/example_data/test_samples", c( "patient_1_plasma_KRAS_G12D.csv", "patient_2_plasma_KRAS_G12D.csv" ), package = "castle" ) # Import test examples test_samples <- import_QS_files( path_to_my_test_samples, merge_wells = "yes" )
To test the samples we use the function test_tumor_sample_integrated
and use the model we trained above:
# Make calls using CASTLE test_res <- test_tumor_sample_integrated( test_samples = test_samples, integrated_model = trained_model )
We will start by looking at some of the results for the tumor and PBL (white blood cells) sample from each patient:
test_res %>% dplyr::filter(Sample %in% c("Tumor", "PBL")) %>% dplyr::select(FileName, Sample, p_val, mutation_detected)
From this, we clearly see that the mutation is present in the tumor from both of the patients and not in PBL. We can therefore use the mutation as a cancer-specific marker, and thus to check if the patient has residual disease after curative intent surgery. As a positive test, we start by looking at the "Plasma_A" sample of both patients, which is a blood sample collected before surgery:
test_res %>% dplyr::filter(Sample %in% c("Plasma_A")) %>% dplyr::select(FileName, Sample, p_val, mutation_detected)
We see that these samples indicate that the mutation, and thus the tumor, is present in both patients, as expected. From the result we can also see the following estimates of properties of the ddPCR analysis and the ctDNA in the blood:
test_res %>% dplyr::filter(Sample %in% c("Plasma_A")) %>% dplyr::select( FileName, Sample, mutant_molecules_per_droplet, wildtype_molecules_per_droplet, total_mutant_molecules, mutant_molecules_per_droplet_CI_lower, mutant_molecules_per_droplet_CI_upper, wildtype_molecules_per_droplet_CI_lower, wildtype_molecules_per_droplet_CI_upper ) test_res %>% dplyr::filter(Sample %in% c("Plasma_A")) %>% dplyr::select( FileName, Sample, allele_frequency, allele_frequency_CI_lower, allele_frequency_CI_upper )
We can then look what happens after the surgery. To do this we look at the blood samples marked as "Plasma_B":
test_res %>% dplyr::filter(Sample %in% c("Plasma_B")) %>% dplyr::select(FileName, Sample, p_val, mutation_detected)
We now see that the test indicates that the mutation is no longer present in any of the patients, which is consistent with the tumor being removed by surgery.
The full list of information available in the analysis results is:
colnames(test_res)
The results can be exported to .csv
format via the function write_csv(x = test_res, file = "your/file/path)
.
Alternatively the functions train_integrated_ddpcr_model
and test_tumor_sample_simple
are used for training and testing respectively. These are based on a faster (and simpler) version of the CASTLE algorithm, but have very similar performance. See [1] for more information.
In the package the function simulate_droplet_counts
for simulating droplet counts from the statistical model that CASTLE is based on is also provided. Here we will simulate a positive and negative sample based on the parameters learned above, and test them using test_tumor_sample_integrated
:
set.seed(42) # DNA concentration (molecules per droplet) l <- 0.5 # WT r_positive <- 0.01 # M r_negative <- 0 # M # Number of droplets n_drops <- 14000 # Positive sample test_sample_positive <- simulate_droplet_counts( l = l, r = r_positive, a = trained_model$a_est, b = trained_model$b_est, c = trained_model$c_est, n_drops = n_drops ) # Negative sample test_sample_negative <- simulate_droplet_counts( l = l, r = r_negative, a = trained_model$a_est, b = trained_model$b_est, c = trained_model$c_est, n_drops = n_drops ) simulated_samples <- bind_rows( data.frame(Sample = "Positive", test_sample_positive), data.frame(Sample = "Negative", test_sample_negative) ) # Test samples sim_test_res <- test_tumor_sample_simple( test_samples = simulated_samples, simple_model = trained_model ) # Print some relevant results sim_test_res %>% select( Sample, p_val, mutation_detected, total_mutant_molecules, total_mutant_molecules_CI_lower, total_mutant_molecules_CI_upper )
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.