In LCR-BCCRC/CHL26predictor: An R package for the CHL 26 Gene Predictor

Introduction

This vignette gives a high level overview on how to use the CHL26predictor R package to generate CHL26 scores. The model consists of 26 genes (3 housekeepers and 23 predictor genes/features)

library("knitr")
library("CHL26predictor")
library("dplyr")
library("ggplot2")
library("reshape2")

kable(CHL26.model.coef.df, caption = "Overview of the Features in Model")

Input Data

To generate the CHL26 scores for the predictor, the R package simply requires the raw nanostring count data in matrix format as input. The R package provides a small test count matrix to use with this vignette:

CHL26.test.exprs.mat

These counts values have been generated from the sampling of count data from a possion distribution with the lambda values sample a uniform distribution.

RCC Files as Input

If you have RCC files, then you can use the convert_RCC_to_mat function to extract count data by simplying pointing to the directory containing all the RCC files.

exprs.mat <- convert_RCC_to_mat("/path/to/rcc/directory")

Generating CHL26 Scores

Once you have the count matrix, you'll need to first normalize the count matrix:

normalizer <- get_normalizer(CHL26.test.exprs.mat)
CHL26.test.exprs.norm.mat <- normalize_mat(CHL26.test.exprs.mat, normalizer)

normalizer %>%
  melt(value.name = "normalizer") %>%
  mutate(sampleID = rownames(.)) %>%
  arrange(normalizer) %>%
  ggplot(aes(x = factor(sampleID, levels = sampleID), y = normalizer)) +
  geom_bar(stat = "identity") +
  xlab("Sample ID") +
  ylab("Normalizer") +
    theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5))

Now that the gene count matrix is normalized, we can generate the CHL26 scores:

scores.df <- get_CHL26_scores(CHL26.test.exprs.norm.mat)
scores.df <- arrange(scores.df, score)

scores.df %>%
  ggplot(aes(x = factor(sampleID, levels = sampleID), y = score)) +
  geom_bar(stat = "identity") +
  xlab("Sample ID") +
  ylab("CHL26 Score") +
  theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5))

Classifying Samples into High and Low Risk

With the CHL26 scores, we can next classify samples into high and low risk by using the published threshold of 0.6235.

risk.thres <- 0.6235
scores.risk.df <- scores.df %>%
  mutate(riskClass = ifelse(score >= risk.thres, "High", "Low"))

scores.risk.df %>%
  ggplot(aes(x = factor(sampleID, levels = sampleID), y = score, 
             fill = riskClass)) +
  geom_bar(stat = "identity") +
  xlab("Sample ID") +
  ylab("CHL26 Score") +
  geom_hline(y = risk.thres, col = "red", linetype = "dotted") +
  theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5))