This document demonstrates the usage of a few convenient wrapper functions for CITRUS. As an example, we use it to analyze the CLL dataset and find differential clusters between NR and CR.
Two inputs are needed for all the analysis, the FCS files and corresponding patient label (e.g. responses). First, we specify the path to FCS file of interest and associated outcome. In this example, we selected the fcs files of CD8 pregated cells from 7 CR and 24 NR CLL patients. Here is a sanity check on the outcome data counts and the list of patient ID:
library(mwang) filename_to_pid <- function(x) { x <- stringr::str_match(x, 'UPCC[0-9_-]+')[1] if (is.na(x)) return(NA) x <- strsplit(x, split='-|_', fixed=F)[[1]] trial <- x[1] pid <- ifelse(nchar(x[2]) == 2, x[2], as.character(as.numeric(x[3]))) return(paste(trial, pid, sep='-')) }
outcome_path <- "/Volumes/pdcs/szmamie/data/20180929_CLL_reanalysis/CLL_outcomes.csv" TCD_dir_path <- "/Volumes/pdcs/szmamie/data/metaAnalysis/Apheresis/CD8/TCD/" TCD_file_list <- list.files(file.path(TCD_dir_path, "8_20160823_ALL_CLL/"), pattern="fcs", full=T) TCD_file_list_pid <- unname(sapply(TCD_file_list, filename_to_pid)) outcome_data <- readr::read_csv(outcome_path) patient_selected <- outcome_data$lulian_outcome %in% c('CR', 'NR') TCD_file_selected <- TCD_file_list[TCD_file_list_pid %in% outcome_data$pid[patient_selected]] kableExtra::kable_styling(kableExtra::kable(outcome_data[patient_selected,]))
After getting the file path, we test-load the first file in the flow dataset to select the channels of interest.
TCD_data_example <- mwang::ReadFCSSet(TCD_dir_path, data.frame(default=TCD_file_selected[1]), verbose=F) channel_names <- TCD_data_example$fileChannelNames$default[[1]] channel_desc <- TCD_data_example$fileReagentNames$default[[1]] kableExtra::kable_styling(kableExtra::kable(cbind(seq(length(channel_desc)), channel_names, channel_desc))) # select channels of interested by index channel_selected_idx <- c(4, 5, 6, 7, 10, 12, 13, 15) channel_desc_selected <- channel_desc[channel_selected_idx] channel_names_selected <- channel_names[channel_selected_idx]
Then we run clustering part of CITRUS on it. mwang::runCITRUS
is a wrapper around hierarchical clustering and feature computation performed by CITRUS. By default, the files are downsampled to 1000 events per file with specified channel transformed by arcsinh
; 1-fold hclust is performed using these channels; the abundance of clusters that are larger than 5\% of the total population are used as features. Check ?mwang::runCITRUS
for more details on options.
clustering <- mwang::runCITRUS(dataDir = TCD_dir_path, selected.desc = channel_desc_selected, fileList = TCD_file_selected )
Next we use the output feature matrix from previous step and ask CITRUS to perform lasso to identify differential clusters. mwang::CITRUSRegression
is a wrapper around CITRUS regression method. Check ?mwang::CITRUSRegression
for more details on input options.
outcomes <- outcome_data$lulian_outcome[patient_selected] model <- mwang::CITRUSRegression(clustering, outcomes)
We can plot the hierarchy from CITRUS. The cluster identified in the regression step, if any, will be marked in darker color.
hierarchy <- mwang::PlotHierarchy(clustering, model) hierarchy$p
res_plotlist = mwang::PlotRes(clustering, hierarchy$cluster.attributes, outcomes, channel_names_selected, channel_desc_selected, ylab='ES') cowplot::plot_grid(plotlist=res_plotlist, nrow=1, rel_widths=c(1, 1, 6))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.