In olapuentesantana/easier: Estimate Systems Immune Response from RNA-seq data

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

#library("easier")

Introduction {#introduction}

Identification of biomarkers of immune response in the tumor microenvironment (TME) for prediction of patients’ response to immune checkpoint inhibitors is a major challenge in immuno-oncology. Tumors are complex systems, and understanding immune response in the TME requires holistic strategies.

We introduce Estimate Systems Immune Response (EaSIeR) tool, an approach to derive a high-level representation of anti-tumor immune responses by leveraging widely accessible patients' tumor RNA-sequencing (RNA-seq) data.

EaSIeR approach

RNA-seq data is integrated with different types of biological prior knowledge to extract quantitative descriptors of the TME, including composition of the immune repertoire, and activity of intra- and extra-cellular communications (see table below). By performing this knowledge-guided dimensionality reduction, there is an improvement in the interpretability of the derived features.

library(knitr)
table1 <- "
| Quantitative descriptor  | Descriptor conception  | Prior knowledge |
| -----------------------  | ----------------------------------- | ----------------------------------- |
| Pathway activity | @HOLLAND2020194431; @Schubert2018 | @HOLLAND2020194431; @Schubert2018 |
| Immune cell quantification | @Finotello2019 | @Finotello2019 |
| TF activity | @Garcia-Alonso01082019 | @Garcia-Alonso01082019 |
| Ligand-Receptor pairs | @LAPUENTESANTANA2021100293 | @Ramilowski2015; @Barretina2012 |
| Cell-cell interaction | @LAPUENTESANTANA2021100293 | @Ramilowski2015; @Barretina2012 |
Table: Quantitative descriptors of the TME
"
cat(table1)

Using the data from The Cancer Genome Atlas (TCGA) [@Chang2013], regularized multi-task linear regression was used to identify how the quantitative descriptors can simultaneously predict multiple hallmarks (i.e. published transcriptome-based predictors) of anti-cancer immune response (see table below). Here, the regularization is applied to select features that are relevant for all tasks.

table2 <- "
| Hallmark of the immune response | Original study |
|-------------------------------- | -------------- |
| Cytolytic activity (CYT) | @ROONEY201548 |
| Roh immune score (Roh_IS) | @Roheaah3560 |
| Chemokine signature (chemokines) | @Messina2012 |
| Davoli immune signature (Davoli_IS) | @Davolieaaf8399 |
| IFNy signature (IFNy) | @Ayers2017 |
| Expanded immune signature (Ayers_expIS) | @Ayers2017 |
| T-cell inflamed signature (Tcell_inflamed) | @Ayers2017 |
| Repressed immune resistance (RIR) | @JERBYARNON2018984 |
| Tertiary lymphoid structures signature (TLS) | @Cabrita2020 |
| Immuno-predictive score (IMPRES) | @Auslander2018 |
| Microsatellite instability status | @Fu2019 |
Table: Hallmarks of anti-cancer immune responses
"
cat(table2)

Cancer-specific models were learned and used to identify interpretable cancer-specific systems biomarkers of immune response. These models are available through easierData package and can be accessed using the get_opt_models() function. Model biomarkers have been experimentally validated in the literature and the performance of EaSIeR predictions has been validated using independent datasets from four different cancer types with patients treated with anti-PD1 or anti-PD-L1 therapy.

For more detailed information, please refer to our original work: Lapuente-Santana et al. "Interpretable systems biomarkers predict response to immune-checkpoint inhibitors". Patterns, 2021 doi:10.1016/j.patter.2021.100293.

EaSIeR tool

This vignette describes how to use the easier package for a pleasant onboarding experience around the EaSIeR approach. By providing just the patients' bulk RNA-seq data as input, the EaSIeR tool allows you to:

Predict biomarker-based immune response
Identify system biomarkers of immune response

Getting started {#gettingstarted}

Starting R, this package can be installed as follows:

BiocManager::install("easier")
# to install also the dependencies used in the vignettes and in the examples:
BiocManager::install("easier", dependencies = TRUE)
# to download and install the development version from GitHub, you can use
BiocManager::install("olapuentesantana/easier")

Once installed, the package can be loaded and attached to your current workspace as follows:

library("easier")

In order to use easier in your workflow, bulk-tumor RNA-seq data is required as input (when available, patients' response to immunotherapy can be additionally provided):

RNA_counts, a data.frame containing raw counts values (with HGNC gene symbols as row names and samples identifiers as column names).
RNA_tpm, a data.frame containing TPM values (with HGNC gene symbols as row names and samples identifiers as column names).
real_patient_response, a character vector containing clinical patients' response to immunotherapy (with non-responders labeled as NR and responders as R).

Use case for `easier`: Bladder cancer patients [@Mariathasan2018]

In this section, we illustrate the main features of easier on a publicly available bladder cancer dataset from Mariathasan et al. "TGF-B attenuates tumour response to PD-L1 blockade by contributing to exclusion of T cells", published in Nature, 2018 doi:10.1038/nature25501. The processed data is available via IMvigor210CoreBiologies package under the CC-BY license.

We will be using a subset of this data as exemplary dataset, made available via easierData. This includes RNA-seq data (count and TPM expression values), information on tumor mutational burden (TMB) and response to ICB therapy from 192 patients. 25 patients with complete response (CR) were classified as responders (R) and 167 patients with progressive disease (PD) as non-responders.

In the following chunk, we load the additional packages that will be required throughout the vignette.

suppressPackageStartupMessages({
  library("easierData")
  library("SummarizedExperiment")
})

Load data from Mariathasan cohort

For this example we will use gene expression data (counts and tpm values). Other variables include patient best overall response (BOR) to anti-PD-L1 therapy, tumor mutational burden (TMB) and the cancer type the cohort belongs to.

dataset_mariathasan <- get_Mariathasan2018_PDL1_treatment()

# patient response
patient_ICBresponse <- colData(dataset_mariathasan)[["BOR"]]
names(patient_ICBresponse) <- colData(dataset_mariathasan)[["pat_id"]]

# tumor mutational burden
TMB <- colData(dataset_mariathasan)[["TMB"]]
names(TMB) <- colData(dataset_mariathasan)[["pat_id"]]

# cohort cancer type 
cancer_type <- metadata(dataset_mariathasan)[["cancertype"]]

# gene expression data
RNA_counts <- assays(dataset_mariathasan)[["counts"]]
RNA_tpm <- assays(dataset_mariathasan)[["tpm"]]

# Select a subset of patients to reduce vignette building time.
set.seed(1234)
pat_subset <- sample(names(patient_ICBresponse), size = 30)
patient_ICBresponse <- patient_ICBresponse[pat_subset]
TMB <- TMB[pat_subset]
RNA_counts <- RNA_counts[, pat_subset]
RNA_tpm <- RNA_tpm[, pat_subset]

# Some genes are causing issues due to approved symbols matching more than one gene
genes_info <- easier:::reannotate_genes(cur_genes = rownames(RNA_tpm))

## Remove non-approved symbols
non_na <- !is.na(genes_info$new_names)
RNA_tpm <- RNA_tpm[non_na, ]
genes_info <- genes_info[non_na, ]

## Remove entries that are withdrawn
RNA_tpm <- RNA_tpm[-which(genes_info$new_names == "entry withdrawn"), ]
genes_info <- genes_info[-which(genes_info$new_names == "entry withdrawn"), ]

## Identify duplicated new genes
newnames_dup <- unique(genes_info$new_names[duplicated(genes_info$new_names)])
newnames_dup_ind <- do.call(c, lapply(newnames_dup, function(X) which(genes_info$new_names == X)))
newnames_dup <- genes_info$new_names[newnames_dup_ind]

## Retrieve data for duplicated genes
tmp <- RNA_tpm[genes_info$old_names[genes_info$new_names %in% newnames_dup],]

## Remove data for duplicated genes
RNA_tpm <- RNA_tpm[-which(rownames(RNA_tpm) %in% rownames(tmp)),]

## Aggregate data of duplicated genes
dup_genes <- genes_info$new_names[which(genes_info$new_names %in% newnames_dup)]
names(dup_genes) <- rownames(tmp)
if (anyDuplicated(newnames_dup)){
  tmp2 <- stats::aggregate(tmp, by = list(dup_genes), FUN = "mean")
  rownames(tmp2) <- tmp2$Group.1
  tmp2$Group.1 <- NULL
}

# Put data together
RNA_tpm <- rbind(RNA_tpm, tmp2)

Compute hallmarks of immune response

Multiple hallmarks (i.e. published transcriptome-based predictors) of the immune response can also be computed using TPM data from RNA-seq. By default, the following scores of the immune response will be computed: cytolytic activity (CYT) [@ROONEY201548], Roh immune score (Roh_IS) [@Roheaah3560], chemokine signature (chemokines) [@Messina2012], Davoli immune signature (Davoli_IS) [@Davolieaaf8399], IFNy signature (IFNy) [@Ayers2017], expanded immune signature (Ayers_expIS) [@Ayers2017], T-cell inflamed signature (Tcell_inflamed) [@Ayers2017], immune resistance program (RIR: resF_down, resF_up, resF) [@JERBYARNON2018984] and tertiary lymphoid structures signature (TLS) [@Cabrita2020]. This selection can be customized by editing the selected_scores option.

hallmarks_of_immune_response <- c("CYT", "Roh_IS", "chemokines", "Davoli_IS", "IFNy", "Ayers_expIS", "Tcell_inflamed", "RIR", "TLS")
immune_response_scores <- compute_scores_immune_response(RNA_tpm = RNA_tpm, 
                                                         selected_scores = hallmarks_of_immune_response)
head(immune_response_scores)

Compute quantitative descriptors of the TME

We are going to use the bulk RNA-seq data to derive, for each patient, the five quantitative descriptors of the TME described above.

By applying quanTIseq [@Finotello2019] method to TPM data from RNA-seq, the quantification of different cell fractions can be done as follows:

cell_fractions <- compute_cell_fractions(RNA_tpm = RNA_tpm)
head(cell_fractions)

By applying PROGENy [@HOLLAND2020194431; @Schubert2018] method to count data from RNA-seq, the activity of 14 signaling pathways can be inferred as in the chunk below.

pathway_activities <- compute_pathway_activity(RNA_counts = RNA_counts,
                                               remove_sig_genes_immune_response = TRUE)
head(pathway_activities)

The call above infers pathway activity as a linear transformation of gene expression data. Since some pathway signature genes were also used to compute scores of immune response (output variable in EaSIeR model). With remove_sig_genes_immune_response set to TRUE (default), the overlapping genes are removed from the pathway signature genes used to infer pathway activities.

By applying DoRothEA [@Garcia-Alonso01082019] method to TPM data from RNA-seq, the activity of 118 TFs can be inferred as follows:

tf_activities <- compute_TF_activity(RNA_tpm = RNA_tpm)
head(tf_activities[,1:5])

Using derived cancer-specific inter-cellular networks, the quantification of 867 ligand-receptor pairs can be done as in the chunk below.

More detailed information on how these networks were obtained can be found in the experimental procedures section from our original work [@LAPUENTESANTANA2021100293].

lrpair_weights <- compute_LR_pairs(RNA_tpm = RNA_tpm,
                                   cancer_type = "pancan")
head(lrpair_weights[,1:5])

Via cancer_type, a cancer-specific ligand-receptor pairs network can be chosen. With cancer_type set to pancan, a pan-cancer network will be used and this is based on the union of all ligand-receptor pairs present across the 18 cancer-specific networks.

Using the ligand-receptor weights as input, 169 cell-cell interaction scores can be derived as in the chunk below.

ccpair_scores <- compute_CC_pairs(lrpairs = lrpair_weights, 
                                  cancer_type = "pancan")
head(ccpair_scores[,1:5])

Again, cancer_type is set to pancan (default). The same cancer_type network used to quantify ligand-receptor pairs should be designated here.

Obtain patients' predictions of immune response

Now we use the quantitative descriptors computed previously as input features to predict anti-tumor immune responses based on the model parameters defined during training. The output of predict_immune_response returns predictions of patients' immune response for each quantitative descriptor. Because models were built in a cancer-type-specific fashion, the user is required to indicate which cancer-specific model should be used for predicting patients' immune response, which can be done via the cancer_type argument.

The cancer_type provided should match one of the cancer-specific models available. These are the following: bladder urothelial carcinoma (BLCA), brest invasive carcinoma (BRCA), cervical and endocervical cancer (CESC), colorectal adenocarcinoma (CRC), glioblastoma multiforme (GBM), head and neck squamous cell carcinoma (HNSC), kidney renal clear cell carcinoma (KIRC), kidney renal papillary cell carcinoma (KIRP), liver hepatocellular carcinoma (LIHC), lung adenocarcinoma (LUAD), lung squamous cell carcinoma (LUSC), non-small-cell lung carcinoma (NSCLC [LUAD + LUSC]), ovarian serous cystadenocarcionma (OV), pancreatic adenocarcionma (PAAD), prostate adenocarcinoma (PRAD), skin cutaneous melanoma (SKCM), stomach adenocarcinoma (STAD), thyroid carcinoma (THCA), uterine corpus endometrial carcinoma (UCEC).

The optimized models can be retrieved from easierData package using the function get_opt_models().

predictions <- predict_immune_response(pathways = pathway_activities,
                                       immunecells = cell_fractions,
                                       tfs = tf_activities,
                                       lrpairs = lrpair_weights,
                                       ccpairs = ccpair_scores,
                                       cancer_type = cancer_type, 
                                       verbose = TRUE)

Once we obtained patients' predicted immune response, two different scenarios should be considered in which:

-patient_response is known and therefore the accuracy of easier predictions can be evaluated.

-patient_response is unknown and no assessments can be carried out.

In this use case, patients' clinical response to PD-L1 therapy is available from Mariathasan cohort, thus we can move forward to assess the performance of the predictions.

Evaluate easier predictions using patients' immunotherapy response

Assessment of patients' response to immunotherapy treatment can be done via assess_immune_response function, which informs about the accuracy of easier predictions on patients' response to ICB therapy.

The option patient_response should be provided with a character string containing patients' clinical response (where non-responders are labeled as NR and responders as R). Importantly, this aspect should be handled by the user.

This function uses patients' TPM values (RNA_tpm) as input for compute_scores_immune_response in order to compare easier predictions with those from published scores of immune response. The user can choose the scores to be computed via select_gold_standard, by default all scores are computed.

If patients' tumor mutational burden (TMB) is available, this can also be provided via TMB_values and used as surrogate of patients' immunotherapy response for comparison.

Since both immune response and TMB are essential for an effective immunotherapy response, we decided to conceptualize this in our predictions by either penalizing or weighting differently our scores in high- and low-TMB patients. If easier_with_TMB is set to weighted_average, a weighted average of both easier and TMB score will be used, if instead easier_with_TMB is set to penalized_score, patient's easier scores will penalized depending on their TMB category.

These two strategies to combine immune response and TMB require the definition of a certain weight or penalty beforehand (weight_penalty). The default weight or penalty is 0.5.

output_eval_with_resp <- assess_immune_response(predictions_immune_response = predictions,
                                                patient_response = patient_ICBresponse,
                                                RNA_tpm = RNA_tpm,
                                                TMB_values = TMB,
                                                easier_with_TMB = "weighted_average",
                                                weight_penalty = 0.5)

Top figure. Area under the curve (AUC) values of patients' predictions based on quantitative descriptors of the TME, an ensemble descriptor based on average of individual descriptors, and the computed scores of immune response (gold standard). Bar plot represent the average AUC across tasks and error bars describe the corresponding standard deviation.

Bottom figure. ROC curves were computed as the average of the ROC curves obtained for each score of immune response.

output_evaluation_with_resp stores the plots from above, and they can be inspected in R as follows:

# inspect output
output_eval_with_resp[[1]]
output_eval_with_resp[[2]]
output_eval_with_resp[[3]]

What if I have an immunotherapy dataset where patients' response is not available?

This is a usual case where we might have a cancer dataset with bulk RNA-seq data but lack information about patients' response to immunotherapy.

In this likely scenario, an score of likelihood of immune response can be assigned to each patient by omitting the argument patient_response within the function assess_immune_response.

output_eval_no_resp <- assess_immune_response(predictions_immune_response = predictions,
                                              TMB_values = TMB,
                                              easier_with_TMB = "weighted_average",
                                              weight_penalty = 0.5)

Top figure. Boxplot of patients' easier score showing its distribution across the 10 different tasks.

Bottom figure. Scatterplot of patients' prediction when combining easier score with tumor mutational burden showing its distribution across the 10 different tasks.

output_evaluation_no_resp stores the plots from above, and they can be inspected in R as follows:

# inspect output
output_eval_no_resp[[1]]
output_eval_no_resp[[2]]

Retrieve easier scores of immune response

We can further retrieve the easier score and also, the refined scores obtained by integrating easier score and TMB via retrieve_easier_score.

easier_derived_scores <- retrieve_easier_score(predictions_immune_response = predictions,
                                               TMB_values = TMB,
                                               easier_with_TMB = c("weighted_average", 
                                                                   "penalized_score"),
                                               weight_penalty = 0.5)

head(easier_derived_scores)

Interpret response to immunotherapy through systems biomarkers {#biomarkers}

Identifying mechanisms used by patients' tumors to resist or succumb to ICB treatment is of paramount importance. Via explore_biomarkers, we can visualize stunning biomarkers of immune response and shed light into possible mechanisms responsible for the patients' response to treatment.

The option patient_label allows to make a two-level comparison by providing a character string containing the label of each patient, for instance, patients' clinical response (where non-responders are labeled as NR and responders as R).

In order to leverage the cancer-specific biomarker weights inferred from model training, you need to specify again which cancer_type the bulk RNA-seq data belongs to. As before, the selected cancer_type should be included in the list described above.

output_biomarkers <- explore_biomarkers(pathways = pathway_activities,
                                        immunecells = cell_fractions,
                                        tfs = tf_activities,
                                        lrpairs = lrpair_weights,
                                        ccpairs = ccpair_scores,
                                        cancer_type = cancer_type,
                                        patient_label = patient_ICBresponse)

The output of explore_biomarkers returns, for each quantitative descriptor, feature's z-score values comparing responders (R) and non-responders (NR) patients with the corresponding feature's model weights (only top 15 biomarkers are shown for tfs, lrpairs and ccpairs descriptors).

Additionally, a volcano plot integrating systems biomarkers from all quantitative descriptors comparing NR and R patients (two-sided Wilcoxon rank-sum test). Significant biomarkers (p < 0.05) are shown in blue. Biomarkers are drawn according to their corresponding sign (shape) and weight (size) obtained during model training. Labels are reported for the top 15 biomarkers (based on the association with the tasks) that are significantly different between R and NR.

output_biomarkers stores the plots from above, and they can be inspected in R as follows:

# inspect output
output_biomarkers[[1]]
output_biomarkers[[2]]
output_biomarkers[[3]]
output_biomarkers[[4]]
output_biomarkers[[5]]
output_biomarkers[[6]]

Session info {- .smaller}

sessionInfo()

References {-}

olapuentesantana/easier documentation built on Feb. 25, 2024, 3:39 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

olapuentesantana/easier
Estimate Systems Immune Response from RNA-seq data

In olapuentesantana/easier: Estimate Systems Immune Response from RNA-seq data

Introduction {#introduction}

EaSIeR approach

EaSIeR tool

Getting started {#gettingstarted}

Use case for `easier`: Bladder cancer patients [@Mariathasan2018]

Load data from Mariathasan cohort

Compute hallmarks of immune response

Compute quantitative descriptors of the TME

Obtain patients' predictions of immune response

Evaluate easier predictions using patients' immunotherapy response

What if I have an immunotherapy dataset where patients' response is not available?

Retrieve easier scores of immune response

Interpret response to immunotherapy through systems biomarkers {#biomarkers}

Session info {- .smaller}

References {-}

R Package Documentation

Browse R Packages

We want your feedback!

olapuentesantana/easier Estimate Systems Immune Response from RNA-seq data

In olapuentesantana/easier: Estimate Systems Immune Response from RNA-seq data

Introduction {#introduction}

EaSIeR approach

EaSIeR tool

Getting started {#gettingstarted}

Use case for easier: Bladder cancer patients [@Mariathasan2018]

Load data from Mariathasan cohort

Compute hallmarks of immune response

Compute quantitative descriptors of the TME

Obtain patients' predictions of immune response

Evaluate easier predictions using patients' immunotherapy response

What if I have an immunotherapy dataset where patients' response is not available?

Retrieve easier scores of immune response

Interpret response to immunotherapy through systems biomarkers {#biomarkers}

Session info {- .smaller}

References {-}

R Package Documentation

Browse R Packages

We want your feedback!

olapuentesantana/easier
Estimate Systems Immune Response from RNA-seq data

Use case for `easier`: Bladder cancer patients [@Mariathasan2018]