dr_to_fcs2: Calculate dimension reduction and cluster annotation with...

View source: R/dr_to_fcs2.R

dr_to_fcs2R Documentation

Calculate dimension reduction and cluster annotation with data from one or more flow frames and add these parameters to a (concatenated) fcs file

Description

Calculate dimension reduction and cluster annotation with data from one or more flow frames and add these parameters to a (concatenated) fcs file

Usage

dr_to_fcs2(
  ff.list,
  channels = NULL,
  add.sample.info = NULL,
  scale.whole = c("none", "z.score", "min.max"),
  scale.samples = c("none", "z.score", "min.max"),
  run.pca = 0,
  run.umap = T,
  run.som = T,
  run.gqtsom = F,
  find.clusters = T,
  metacluster_map = c("SOM", "GQTSOM"),
  metacluster_map_source = c("UMAP", "expr"),
  clustering.for.marker.calc = c("cutree_5", "cutree_10", "cutree_15"),
  calc.global.markers = F,
  calc.pairwise.markers = F,
  UMAP_args = list(metric = "cosine", verbose = T, scale = T),
  SOM_args = list(),
  GQTSOM_args = list(),
  EmbedSOM_args = list(),
  dist_args = list(),
  hclust_args = list(method = "average"),
  cutree_args = list(k = c(5, 10, 15)),
  extra.cols = NULL,
  mc.cores = 1,
  save.to.disk = c("fcs", "rds"),
  save.path = file.path(getwd(), paste0(substr(gsub("\\.", "",
    make.names(as.character(Sys.time()))), 2, 15), "_FCS_dr")),
  save.name = NULL,
  exclude.extra.channels = ifelse(length(ff.list) == 1 && names(ff.list) ==
    "transformed", "cluster.id", "FSC|SSC|Time|cluster.id"),
  write.transformed.channels.to.FCS = T,
  write.untransformed.channels.to.FCS = T,
  write.scaled.channels.to.FCS = F,
  timeChannel = c("Time", "HDR-T"),
  transformation_name = "trans",
  return_processed_raw_data_only = F,
  seed = 42
)

Arguments

ff.list

a list of flowFrames as received from fcexpr::wsp_get_ff (compensated with Compensation Matrix as defined in FlowJo by default) or as received from fcexpr::inds_get_ff (directly from FCS files, not compensated by default)

channels

a named vector of channels to use for dimension reduction. values can be channel names (v-450LP..., b-520..., or so), or channel descriptions (e.g. CD3 or LAG3-PeCy7 for example) names will be used as new description in the fcs file to be written; if no names provided, names of the very first FCS file will be used

add.sample.info

named list of additional channels to identify samples or group them in flowjo; e.g. for 9 fcs files to be concatenated: add.sample.info = list(condition = c(1,2,3,1,2,3,1,2,3), donor = c(1,1,1,2,2,2,3,3,3))

scale.whole

if and how to scale channels after concatenation of flowframes in ff.list

scale.samples

if and how to scale channels of flowframes in ff.list individually before concatenation

run.pca

select number of pcs; if 0 no pca is computed

run.umap

logical, whether to calculate UMAP dimension reduction with uwot::umap

run.som

logical, whether to calculate SOM dimension reduction EmbedSOM::SOM followed by EmbedSOM::EmbedSOM

run.gqtsom

logical, whether to calculate GQTSOM dimension reduction EmbedSOM::GQTSOM followed by EmbedSOM::EmbedSOM

find.clusters

logical, whether to detect clusters with stats::dist, stats::hclust and stats::cutree

metacluster_map

which map to use for metaclustering; SOM or GQTSOM

metacluster_map_source

which source to use for metaclustering; actual expression valueS (fluorescence intensites) or UMAP dimension (the latter of which was found to work really well)

clustering.for.marker.calc

if NULL nothing is calculated; otherwise marker features (stained markers) are determined by wilcox test using presto::wilcoxauc

calc.global.markers

logical whether to calculate global cluster markers: so each cluster vs. all other cells

calc.pairwise.markers

logical whether to calculate pairwise cluster markers: so each cluster vs. each cluster

UMAP_args

arguments to uwot::umap; use pca and scale arguments to have data only modified for UMAP but not for SOM

SOM_args

args to EmbedSOM::SOM

GQTSOM_args

args to EmbedSOM::GQTSOM

EmbedSOM_args

args to EmbedSOM::EmbedSOM

dist_args

arguments to stats::dist

hclust_args

arguments to stats::hclust

cutree_args

arguments to stats::cutree; only k is relevant

extra.cols

numeric vector of an extra column (or matrix of multiple columns) with arbitraty information to add to the final fcs file; has to be equal to the number of rows in all flowframes provided; colnames will be the channel names in the FCS file; could be a previously calculated dimension reduction or cluster annotation.

mc.cores

number of cores to use for parallel computing of cluster markers and multiple cluster resolutions; mc.cores is passed to parallel::mclapply or parallel::mcmapply; limited to parallel::detectCores()-1

save.to.disk

what to save to disk: (concatenated) and appended FCS file and/or rds file with several elements in a list

save.path

where to save elements specified in save.to.disk; set to NULL to have nothing written to disk

save.name

name to use for save rds and or fcs file

exclude.extra.channels

when scaled and transform channels are written to FCS file, some channels may be redundant and will only occupy disk space, those are specified here; matched with grepl

write.transformed.channels.to.FCS

do save transformed channels (e.g. logicle or arcsinh) to FCS file

write.untransformed.channels.to.FCS

do save untransformed channels (inverse) to FCS file

write.scaled.channels.to.FCS

do save scaled channels (scale.whole, scale.samples) to FCS file

timeChannel

name of the Time channel to exclude from all analyses and calculation; if NULL will be attempted to be detected automatically

transformation_name

name of the applied transformation (will appear in FCS file as channel desc)

return_processed_raw_data_only

do not calculate dimension reduction etc but only return the preprocessed data for external calculations or tryouts

seed

set a seed for reproduction of dimension reductions

Value

A list with 3 elements: (i) The matrix of fluorescence intensities and appended information (dim red, clustering). This is the table which is written into a newly generated fcs file. (ii) A character vector of meaningful column names which may be used for the table in R (rather for convenience). (iii) Tables of marker features (each cluster vs all other events and all clusters pairwise).

Examples

## Not run: 
############################
### Plot cluster markers ###
############################
dr <- fcexpr::dr_to_fcs(ff.list = ffs,
channels = channels,
louvain__resolution = 0.5,
run.lda = "louvain_0.5",
clustering.for.marker.calc = c("louvain_0.5"),
save.path = NULL)
marker <- dr[[3]][[1]][[1]]
marker$channel_desc2 <- sapply(strsplit(marker$channel_desc, "_"), "[", 1)
marker <-
 marker %>%
 dplyr::mutate(pvalue = ifelse(pvalue == 0, 1e-300, marker$pvalue)) %>%
 dplyr::group_by(channel_desc2) %>%
 dplyr::mutate(mean_scaled = fcexpr:::min.max.normalization(mean))

ggplot(marker, aes(x = as.factor(cluster), y = channel_desc2, fill = -log10(pvalue))) +
 geom_tile(color = "black") +
 theme_bw() +
 geom_text(aes(label = diff_sign)) +
 scale_fill_viridis_c()

ggplot(marker, aes(x = as.factor(cluster), y = channel_desc2, fill = mean_diff)) +
 geom_tile(color = "black") +
 theme_bw() +
 geom_text(aes(label = diff_sign)) +
 scale_fill_viridis_c()

ggplot(marker, aes(x = as.factor(cluster), y = channel_desc2, fill = mean_scaled)) +
 geom_tile(color = "black") +
 theme_bw() +
 scale_fill_viridis_c()

ggplot(marker, aes(x = mean_diff, y = -log10(pvalue), label = channel_desc2)) + #color = channel_desc2,
 geom_point() +
 theme_bw() +
 labs(title = "cluster markers (vs all other cells each)") +
 ggrepel::geom_text_repel(max.overlaps = 20, show.legend = F) +
 theme(panel.grid.minor.x = element_blank(), panel.grid.minor.y = element_blank(), panel.grid.major.x = element_blank(), strip.background = element_rect(fill = "hotpink2")) +
 geom_vline(xintercept = 0, col = "tomato2", linetype = "dashed") +
 geom_hline(yintercept = 100, col = "tomato2", linetype = "dashed") +
 facet_wrap(vars(cluster))

 ##############################################
 ### find clusters which may be subject #######
 ### to bi- or multimodality in any channel ###
 ##############################################
 # make one data frame
 marker_all <- purrr::map_dfr(setNames(names(out[["marker"]]), names(out[["marker"]])),
 function(x) out[["marker"]][[x]][["marker_table"]], .id = "clustering")

 # sort by diptest p value; or low p indicates bi- or multimodality
 marker_all <- dplyr::arrange(marker_all, diptest_p)
 # see ?diptest::dip.test
 # a low p-value indicates bi- or multimodality (multiple peaks)
 # a high p-value (close to 1) indicates unimodality
 # multimodality indicates heterogeneity within in the cluster
 # and may justify to separate that cluster further into sub-clusters
 # this depends on the interpretation of the scientist though


 ##############################################
 ### overlay gated populations from flowjo ####
 ### on dimension reduction plot (SOM/UMAP) ###
 ##############################################

common_cols <- Reduce(intersect, purrr::map(ff[["indices"]], colnames))
ff[["indices"]] <- purrr::map(ff[["indices"]], function(x) x[,which(colnames(x) %in% common_cols)])
ind_mat <- do.call(rbind, ff[["indices"]])

som <- ggplot(out[["df"]], aes(SOM_1, SOM_2, color = as.factor(cutree_30))) +
geom_point(size = 0.5) +
geom_density2d(data = . %>% dplyr::filter(ind_mat[,which(basename(colnames(ind_mat)) == "NK cells")]), color = "black",
contour_var = "ndensity", breaks = c(0.1, 0.2, 0.4, 0.6, 0.8)) +
guides(color = guide_legend(override.aes = list(size = 2)))


## End(Not run)

Close-your-eyes/fcexpr documentation built on Sept. 29, 2023, 12:27 a.m.