library("knitr") opts_chunk$set( collapse = TRUE, comment = "#>", fig.path = "man/figures/" )
An R Package for the RHL30 prognostic predictor. The predictor is a gene expression-based prognostic model for predicting post-autologous stem-cell transplantation outcomes. It designed to be used on RHL30 NanoString expression count data on relapsed Hodgkin lymphoma (RHL) samples.
The predictor was published at:
Chan FC*, Mottok A*, et al. Prognostic Model to Predict Post-Autologous Stem-Cell Transplantation Outcomes in Classical Hodgkin Lymphoma. J Clin Oncol JCO2017727925 (2017) doi:10.1200/JCO.2017.72.7925. *Contributed equally to this work.
To install this package, you need to first have the package devtools
installed, then you run:
devtools::install_github("tinyheero/RHL30")
We will be using the BCCA RHL30 training cohort from the paper as an example of how to generate RHL30 predictor score. The following steps will reproduce the RHL30 scores from the paper.
First, let's load the RHL30 package and the RHL30 model:
library("RHL30") library("dplyr") rhl30_model_df <- get_rhl30_model_coef_df() rhl30_model_df
The model contains a total of 30 genes:
The next step is to load the expression data you want to generate RHL30 scores
on. The expression data should be a tab-separated values file. The first line
should be a header line with gene_name
as the first column followed by
the sample identifiers. Each row should then be the name of the gene and then
the respectively raw expression values for each sample.
The expression data of the BCCA RHL30 training cohort is provided as an example. Let's load that data:
exprs_file <- system.file("extdata", "bcca_rhl_rhl30_gene_exprs_mat.tsv", package = "RHL30") exprs_mat <- load_exprs_mat(exprs_file) dim(exprs_mat)
The expression data contains the 30 genes (rows) and 68 samples (columns). Next we calculate the normalizer values (geometric mean of the 12 housekeepers) for each sample:
hk_genes <- filter(rhl30_model_df, gene_type == "housekeeper") %>% pull("gene_name") sample_normalizer_values <- get_sample_normalizer_value(exprs_mat, hk_genes)
In the paper, a threshold of 35 was set to exclude poor quality samples. This was done because very low normalizer values often lead to very high normalized expression values. We can apply this threshold to eliminate poor quality samples:
high_quality_samples <- names(sample_normalizer_values[sample_normalizer_values > 35]) filtered_exprs_mat <- exprs_mat[, high_quality_samples] dim(filtered_exprs_mat)
This eliminates 2 poor quality samples leaving us with 66 samples. Note that the sample HL1120 did not receive ASCT and thus was not reported in figure 4 of the paper. As such, the final number in figure 4 is 65 samples.
Let's normalize our expression matrix and generate the RHL30 scores for each sample:
filtered_exprs_mat_norm <- normalize_exprs_mat(filtered_exprs_mat, sample_normalizer_values) rhl30_df <- get_rhl30_scores_df(filtered_exprs_mat_norm, rhl30_model_df) head(rhl30_df)
devtools::session_info()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.