In Mamie/mwang: Collection of useful functions

Purpose

This document demonstrates the usage of a few convenient wrapper functions for CITRUS. As an example, we use it to analyze the CLL dataset and find differential clusters between NR and CR.

Two inputs are needed for all the analysis, the FCS files and corresponding patient label (e.g. responses). First, we specify the path to FCS file of interest and associated outcome. In this example, we selected the fcs files of CD8 pregated cells from 7 CR and 24 NR CLL patients. Here is a sanity check on the outcome data counts and the list of patient ID:

library(mwang)
filename_to_pid <- function(x) {
  x <- stringr::str_match(x, 'UPCC[0-9_-]+')[1]
  if (is.na(x)) return(NA)
  x <- strsplit(x, split='-|_', fixed=F)[[1]]
  trial <- x[1]
  pid <- ifelse(nchar(x[2]) == 2, x[2], as.character(as.numeric(x[3])))
  return(paste(trial, pid, sep='-'))
}

outcome_path <- "/Volumes/pdcs/szmamie/data/20180929_CLL_reanalysis/CLL_outcomes.csv"
TCD_dir_path <- "/Volumes/pdcs/szmamie/data/metaAnalysis/Apheresis/CD8/TCD/"
TCD_file_list <- list.files(file.path(TCD_dir_path, "8_20160823_ALL_CLL/"), pattern="fcs", full=T)
TCD_file_list_pid <- unname(sapply(TCD_file_list, filename_to_pid))
outcome_data <- readr::read_csv(outcome_path)
patient_selected <- outcome_data$lulian_outcome %in% c('CR', 'NR')
TCD_file_selected <- TCD_file_list[TCD_file_list_pid %in% outcome_data$pid[patient_selected]]
kableExtra::kable_styling(kableExtra::kable(outcome_data[patient_selected,]))

After getting the file path, we test-load the first file in the flow dataset to select the channels of interest.

TCD_data_example <- mwang::ReadFCSSet(TCD_dir_path, data.frame(default=TCD_file_selected[1]), verbose=F)
channel_names <- TCD_data_example$fileChannelNames$default[[1]]
channel_desc <- TCD_data_example$fileReagentNames$default[[1]]
kableExtra::kable_styling(kableExtra::kable(cbind(seq(length(channel_desc)), channel_names, channel_desc)))
# select channels of interested by index
channel_selected_idx <- c(4, 5, 6, 7, 10, 12, 13, 15)
channel_desc_selected <- channel_desc[channel_selected_idx]
channel_names_selected <- channel_names[channel_selected_idx]

Then we run clustering part of CITRUS on it. mwang::runCITRUS is a wrapper around hierarchical clustering and feature computation performed by CITRUS. By default, the files are downsampled to 1000 events per file with specified channel transformed by arcsinh; 1-fold hclust is performed using these channels; the abundance of clusters that are larger than 5\% of the total population are used as features. Check ?mwang::runCITRUS for more details on options.

clustering <- mwang::runCITRUS(dataDir = TCD_dir_path, 
                               selected.desc = channel_desc_selected, 
                               fileList = TCD_file_selected
                               )

Next we use the output feature matrix from previous step and ask CITRUS to perform lasso to identify differential clusters. mwang::CITRUSRegression is a wrapper around CITRUS regression method. Check ?mwang::CITRUSRegression for more details on input options.

outcomes <- outcome_data$lulian_outcome[patient_selected]
model <- mwang::CITRUSRegression(clustering, outcomes)

We can plot the hierarchy from CITRUS. The cluster identified in the regression step, if any, will be marked in darker color.

hierarchy <- mwang::PlotHierarchy(clustering, model)
hierarchy$p

res_plotlist = mwang::PlotRes(clustering, hierarchy$cluster.attributes, outcomes, channel_names_selected, 
        channel_desc_selected, ylab='ES')
cowplot::plot_grid(plotlist=res_plotlist, nrow=1, rel_widths=c(1, 1, 6))