In simondrue/castle: Fortify your ddPCR analyses for low-frequency variant detection

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

Analysing data

In this section we will go through a typical analysis workflow using the castle package:

How to load a set of droplet counts from a QuantaSoft dataset. For this the data needs to be in .csv-format.
Training a model for the CASTLE algorithm. This is done by analyzing the (assay) specific pattern of errors in a set of negative (background) samples.
Analyze some samples of interest for the presence of a mutation, by utilizing the CASTLE algorithm with the model trained above.

The first thing we need to do is load the packages we need for the analysis:

library(castle)
library(dplyr)

Loading QuantaSoft data

We start by loading a QuantaSoft dataset in .csv-format using the included import_QS_files-function. Here we will use the example data included in the package, where we have both negative samples of different concentrations (training samples) and some test samples available for a KRAS G12D assay. For this assay Ch1 was used for measuring the mutant signal, which is set using the flag Ch1_is_mutation = TRUE. If Ch2 was used this is simply set to FALSE. Data is loaded like this:

path_to_my_data <- c("path/to/a/csv_file", "path/to/a/folder/with/csv_files/")
my_samples <- import_QS_files(path_to_my_data)

For this example we will load the training samples that are bundled in the castle-package:

path_to_my_data <- system.file("extdata/example_data/", "background_data_KRAS_G12D.csv", package = "castle")
background_samples <- import_QS_files(path_to_my_data, Ch1_is_mutation = TRUE)

The first 5 samples/wells in background_samples looks like this:

head(background_samples, 5)

Note that besides some metadata (file name, target etc.) for each sample, the dataset contains the columns "WildtypeOnlyDroplets", "MutantOnlyDroplets", "DoubleNegativeDroplets" and "DoublePositiveDroplets". These are the counts of the four different kinds of droplets for each sample based on the two signals in Ch1 and Ch2. The following model training and statistical tests are based solely on these counts.

Training a model for CASTLE

To train the model we are only interested in the negative samples. Ideally these should cover the range of different concentrations of DNA that the trained model is expected to encounter in future test samples. Before training we start by removing the empty samples ("NTC") and positive controls from the training dataset we loaded above:

# Remove control samples ("NTC" and "Positive Control")
clean_background_samples <-
  background_samples %>%
  filter(
    !(Ch1TargetType %in% c("NTC", "Positive Control")),
    !(Ch2TargetType %in% c("NTC", "Positive Control"))
  )

The cleaned data is then used for model training using the function train_simple_ddpcr_model:

trained_model <- train_integrated_ddpcr_model(
  background_samples = clean_background_samples
)

The object trained_model is simply a list of parameters used in the statistical model in CASTLE.

Testing some samples with CASTLE

For testing we will use a variety of samples from two different patients, who have undergone curative surgery for colorectal cancer. We start by loading the data from these patients. To load the data we will again use the function import_QS_files. But this time, data from some of the samples is distributed across multiple wells. These can be merged together using the flag merge_wells = 'yes' (see ?import_QS_files for other methods for merging). Furthermore samples across multiple files can also be merged using the flag merge_files = TRUE if necessary.

# Path to example data in the package
path_to_my_test_samples <- system.file(
  "extdata/example_data/test_samples",
  c(
    "patient_1_plasma_KRAS_G12D.csv",
    "patient_2_plasma_KRAS_G12D.csv"
  ),
  package = "castle"
)

# Import test examples
test_samples <-
  import_QS_files(
    path_to_my_test_samples,
    merge_wells = "yes"
  )

To test the samples we use the function test_tumor_sample_integrated and use the model we trained above:

# Make calls using CASTLE
test_res <- test_tumor_sample_integrated(
  test_samples = test_samples,
  integrated_model = trained_model
)

We will start by looking at some of the results for the tumor and PBL (white blood cells) sample from each patient:

test_res %>%
  dplyr::filter(Sample %in% c("Tumor", "PBL")) %>%
  dplyr::select(FileName, Sample, p_val, mutation_detected)

From this, we clearly see that the mutation is present in the tumor from both of the patients and not in PBL. We can therefore use the mutation as a cancer-specific marker, and thus to check if the patient has residual disease after curative intent surgery. As a positive test, we start by looking at the "Plasma_A" sample of both patients, which is a blood sample collected before surgery:

test_res %>%
  dplyr::filter(Sample %in% c("Plasma_A")) %>%
  dplyr::select(FileName, Sample, p_val, mutation_detected)

We see that these samples indicate that the mutation, and thus the tumor, is present in both patients, as expected. From the result we can also see the following estimates of properties of the ddPCR analysis and the ctDNA in the blood:

test_res %>%
  dplyr::filter(Sample %in% c("Plasma_A")) %>%
  dplyr::select(
    FileName, Sample,
    mutant_molecules_per_droplet,
    wildtype_molecules_per_droplet,
    total_mutant_molecules,
    mutant_molecules_per_droplet_CI_lower, mutant_molecules_per_droplet_CI_upper,
    wildtype_molecules_per_droplet_CI_lower, wildtype_molecules_per_droplet_CI_upper
  )

test_res %>%
  dplyr::filter(Sample %in% c("Plasma_A")) %>%
  dplyr::select(
    FileName, Sample,
    allele_frequency,
    allele_frequency_CI_lower, allele_frequency_CI_upper
  )

We can then look what happens after the surgery. To do this we look at the blood samples marked as "Plasma_B":

test_res %>%
  dplyr::filter(Sample %in% c("Plasma_B")) %>%
  dplyr::select(FileName, Sample, p_val, mutation_detected)

We now see that the test indicates that the mutation is no longer present in any of the patients, which is consistent with the tumor being removed by surgery.

The full list of information available in the analysis results is:

colnames(test_res)

The results can be exported to .csv format via the function write_csv(x = test_res, file = "your/file/path).

Note

Alternatively the functions train_integrated_ddpcr_model and test_tumor_sample_simple are used for training and testing respectively. These are based on a faster (and simpler) version of the CASTLE algorithm, but have very similar performance. See [1] for more information.

Simulating data

In the package the function simulate_droplet_counts for simulating droplet counts from the statistical model that CASTLE is based on is also provided. Here we will simulate a positive and negative sample based on the parameters learned above, and test them using test_tumor_sample_integrated:

set.seed(42)
# DNA concentration (molecules per droplet)
l <- 0.5 # WT
r_positive <- 0.01 # M
r_negative <- 0 # M

# Number of droplets
n_drops <- 14000

# Positive sample
test_sample_positive <-
  simulate_droplet_counts(
    l = l, r = r_positive,
    a = trained_model$a_est, b = trained_model$b_est, c = trained_model$c_est,
    n_drops = n_drops
  )

# Negative sample
test_sample_negative <-
  simulate_droplet_counts(
    l = l, r = r_negative,
    a = trained_model$a_est, b = trained_model$b_est, c = trained_model$c_est,
    n_drops = n_drops
  )


simulated_samples <- bind_rows(
  data.frame(Sample = "Positive", test_sample_positive),
  data.frame(Sample = "Negative", test_sample_negative)
)


# Test samples
sim_test_res <- test_tumor_sample_simple(
  test_samples = simulated_samples,
  simple_model = trained_model
)

# Print some relevant results
sim_test_res %>% select(
  Sample, p_val, mutation_detected,
  total_mutant_molecules, total_mutant_molecules_CI_lower, total_mutant_molecules_CI_upper
)

simondrue/castle documentation built on Jan. 29, 2022, 3:04 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

simondrue/castle
Fortify your ddPCR analyses for low-frequency variant detection

In simondrue/castle: Fortify your ddPCR analyses for low-frequency variant detection

Analysing data

Loading QuantaSoft data

Training a model for CASTLE

Testing some samples with CASTLE

Note

Simulating data

R Package Documentation

Browse R Packages

We want your feedback!

simondrue/castle Fortify your ddPCR analyses for low-frequency variant detection

In simondrue/castle: Fortify your ddPCR analyses for low-frequency variant detection

Analysing data

Loading QuantaSoft data

Training a model for CASTLE

Testing some samples with CASTLE

Note

Simulating data

R Package Documentation

Browse R Packages

We want your feedback!

simondrue/castle
Fortify your ddPCR analyses for low-frequency variant detection