This vignette gives a high level overview on how to use the CHL26predictor R package to generate CHL26 scores. The model consists of 26 genes (3 housekeepers and 23 predictor genes/features)
library("knitr") library("CHL26predictor") library("dplyr") library("ggplot2") library("reshape2")
kable(CHL26.model.coef.df, caption = "Overview of the Features in Model")
To generate the CHL26 scores for the predictor, the R package simply requires the raw nanostring count data in matrix format as input. The R package provides a small test count matrix to use with this vignette:
CHL26.test.exprs.mat
These counts values have been generated from the sampling of count data from a possion distribution with the lambda values sample a uniform distribution.
If you have RCC files, then you can use the convert_RCC_to_mat
function to extract count data by simplying pointing to the directory containing all the RCC files.
exprs.mat <- convert_RCC_to_mat("/path/to/rcc/directory")
Once you have the count matrix, you'll need to first normalize the count matrix:
normalizer <- get_normalizer(CHL26.test.exprs.mat) CHL26.test.exprs.norm.mat <- normalize_mat(CHL26.test.exprs.mat, normalizer) normalizer %>% melt(value.name = "normalizer") %>% mutate(sampleID = rownames(.)) %>% arrange(normalizer) %>% ggplot(aes(x = factor(sampleID, levels = sampleID), y = normalizer)) + geom_bar(stat = "identity") + xlab("Sample ID") + ylab("Normalizer") + theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5))
Now that the gene count matrix is normalized, we can generate the CHL26 scores:
scores.df <- get_CHL26_scores(CHL26.test.exprs.norm.mat) scores.df <- arrange(scores.df, score) scores.df %>% ggplot(aes(x = factor(sampleID, levels = sampleID), y = score)) + geom_bar(stat = "identity") + xlab("Sample ID") + ylab("CHL26 Score") + theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5))
With the CHL26 scores, we can next classify samples into high and low risk by using the published threshold of 0.6235.
risk.thres <- 0.6235 scores.risk.df <- scores.df %>% mutate(riskClass = ifelse(score >= risk.thres, "High", "Low")) scores.risk.df %>% ggplot(aes(x = factor(sampleID, levels = sampleID), y = score, fill = riskClass)) + geom_bar(stat = "identity") + xlab("Sample ID") + ylab("CHL26 Score") + geom_hline(y = risk.thres, col = "red", linetype = "dotted") + theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5))
As this is synthetic data, the proportion of the high to low-risk class does NOT reflect the true expected proportion.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.