dr_to_fcs2 | R Documentation |
Calculate dimension reduction and cluster annotation with data from one or more flow frames and add these parameters to a (concatenated) fcs file
dr_to_fcs2(
ff.list,
channels = NULL,
add.sample.info = NULL,
scale.whole = c("none", "z.score", "min.max"),
scale.samples = c("none", "z.score", "min.max"),
run.pca = 0,
run.umap = T,
run.som = T,
run.gqtsom = F,
find.clusters = T,
metacluster_map = c("SOM", "GQTSOM"),
metacluster_map_source = c("UMAP", "expr"),
clustering.for.marker.calc = c("cutree_5", "cutree_10", "cutree_15"),
calc.global.markers = F,
calc.pairwise.markers = F,
UMAP_args = list(metric = "cosine", verbose = T, scale = T),
SOM_args = list(),
GQTSOM_args = list(),
EmbedSOM_args = list(),
dist_args = list(),
hclust_args = list(method = "average"),
cutree_args = list(k = c(5, 10, 15)),
extra.cols = NULL,
mc.cores = 1,
save.to.disk = c("fcs", "rds"),
save.path = file.path(getwd(), paste0(substr(gsub("\\.", "",
make.names(as.character(Sys.time()))), 2, 15), "_FCS_dr")),
save.name = NULL,
exclude.extra.channels = ifelse(length(ff.list) == 1 && names(ff.list) ==
"transformed", "cluster.id", "FSC|SSC|Time|cluster.id"),
write.transformed.channels.to.FCS = T,
write.untransformed.channels.to.FCS = T,
write.scaled.channels.to.FCS = F,
timeChannel = c("Time", "HDR-T"),
transformation_name = "trans",
return_processed_raw_data_only = F,
seed = 42
)
ff.list |
a list of flowFrames as received from fcexpr::wsp_get_ff (compensated with Compensation Matrix as defined in FlowJo by default) or as received from fcexpr::inds_get_ff (directly from FCS files, not compensated by default) |
channels |
a named vector of channels to use for dimension reduction. values can be channel names (v-450LP..., b-520..., or so), or channel descriptions (e.g. CD3 or LAG3-PeCy7 for example) names will be used as new description in the fcs file to be written; if no names provided, names of the very first FCS file will be used |
add.sample.info |
named list of additional channels to identify samples or group them in flowjo; e.g. for 9 fcs files to be concatenated: add.sample.info = list(condition = c(1,2,3,1,2,3,1,2,3), donor = c(1,1,1,2,2,2,3,3,3)) |
scale.whole |
if and how to scale channels after concatenation of flowframes in ff.list |
scale.samples |
if and how to scale channels of flowframes in ff.list individually before concatenation |
run.pca |
select number of pcs; if 0 no pca is computed |
run.umap |
logical, whether to calculate UMAP dimension reduction with uwot::umap |
run.som |
logical, whether to calculate SOM dimension reduction EmbedSOM::SOM followed by EmbedSOM::EmbedSOM |
run.gqtsom |
logical, whether to calculate GQTSOM dimension reduction EmbedSOM::GQTSOM followed by EmbedSOM::EmbedSOM |
find.clusters |
logical, whether to detect clusters with stats::dist, stats::hclust and stats::cutree |
metacluster_map |
which map to use for metaclustering; SOM or GQTSOM |
metacluster_map_source |
which source to use for metaclustering; actual expression valueS (fluorescence intensites) or UMAP dimension (the latter of which was found to work really well) |
clustering.for.marker.calc |
if NULL nothing is calculated; otherwise marker features (stained markers) are determined by wilcox test using presto::wilcoxauc |
calc.global.markers |
logical whether to calculate global cluster markers: so each cluster vs. all other cells |
calc.pairwise.markers |
logical whether to calculate pairwise cluster markers: so each cluster vs. each cluster |
UMAP_args |
arguments to uwot::umap; use pca and scale arguments to have data only modified for UMAP but not for SOM |
SOM_args |
args to EmbedSOM::SOM |
GQTSOM_args |
args to EmbedSOM::GQTSOM |
EmbedSOM_args |
args to EmbedSOM::EmbedSOM |
dist_args |
arguments to stats::dist |
hclust_args |
arguments to stats::hclust |
cutree_args |
arguments to stats::cutree; only k is relevant |
extra.cols |
numeric vector of an extra column (or matrix of multiple columns) with arbitraty information to add to the final fcs file; has to be equal to the number of rows in all flowframes provided; colnames will be the channel names in the FCS file; could be a previously calculated dimension reduction or cluster annotation. |
mc.cores |
number of cores to use for parallel computing of cluster markers and multiple cluster resolutions; mc.cores is passed to parallel::mclapply or parallel::mcmapply; limited to parallel::detectCores()-1 |
save.to.disk |
what to save to disk: (concatenated) and appended FCS file and/or rds file with several elements in a list |
save.path |
where to save elements specified in save.to.disk; set to NULL to have nothing written to disk |
save.name |
name to use for save rds and or fcs file |
exclude.extra.channels |
when scaled and transform channels are written to FCS file, some channels may be redundant and will only occupy disk space, those are specified here; matched with grepl |
write.transformed.channels.to.FCS |
do save transformed channels (e.g. logicle or arcsinh) to FCS file |
write.untransformed.channels.to.FCS |
do save untransformed channels (inverse) to FCS file |
write.scaled.channels.to.FCS |
do save scaled channels (scale.whole, scale.samples) to FCS file |
timeChannel |
name of the Time channel to exclude from all analyses and calculation; if NULL will be attempted to be detected automatically |
transformation_name |
name of the applied transformation (will appear in FCS file as channel desc) |
return_processed_raw_data_only |
do not calculate dimension reduction etc but only return the preprocessed data for external calculations or tryouts |
seed |
set a seed for reproduction of dimension reductions |
A list with 3 elements: (i) The matrix of fluorescence intensities and appended information (dim red, clustering). This is the table which is written into a newly generated fcs file. (ii) A character vector of meaningful column names which may be used for the table in R (rather for convenience). (iii) Tables of marker features (each cluster vs all other events and all clusters pairwise).
## Not run:
############################
### Plot cluster markers ###
############################
dr <- fcexpr::dr_to_fcs(ff.list = ffs,
channels = channels,
louvain__resolution = 0.5,
run.lda = "louvain_0.5",
clustering.for.marker.calc = c("louvain_0.5"),
save.path = NULL)
marker <- dr[[3]][[1]][[1]]
marker$channel_desc2 <- sapply(strsplit(marker$channel_desc, "_"), "[", 1)
marker <-
marker %>%
dplyr::mutate(pvalue = ifelse(pvalue == 0, 1e-300, marker$pvalue)) %>%
dplyr::group_by(channel_desc2) %>%
dplyr::mutate(mean_scaled = fcexpr:::min.max.normalization(mean))
ggplot(marker, aes(x = as.factor(cluster), y = channel_desc2, fill = -log10(pvalue))) +
geom_tile(color = "black") +
theme_bw() +
geom_text(aes(label = diff_sign)) +
scale_fill_viridis_c()
ggplot(marker, aes(x = as.factor(cluster), y = channel_desc2, fill = mean_diff)) +
geom_tile(color = "black") +
theme_bw() +
geom_text(aes(label = diff_sign)) +
scale_fill_viridis_c()
ggplot(marker, aes(x = as.factor(cluster), y = channel_desc2, fill = mean_scaled)) +
geom_tile(color = "black") +
theme_bw() +
scale_fill_viridis_c()
ggplot(marker, aes(x = mean_diff, y = -log10(pvalue), label = channel_desc2)) + #color = channel_desc2,
geom_point() +
theme_bw() +
labs(title = "cluster markers (vs all other cells each)") +
ggrepel::geom_text_repel(max.overlaps = 20, show.legend = F) +
theme(panel.grid.minor.x = element_blank(), panel.grid.minor.y = element_blank(), panel.grid.major.x = element_blank(), strip.background = element_rect(fill = "hotpink2")) +
geom_vline(xintercept = 0, col = "tomato2", linetype = "dashed") +
geom_hline(yintercept = 100, col = "tomato2", linetype = "dashed") +
facet_wrap(vars(cluster))
##############################################
### find clusters which may be subject #######
### to bi- or multimodality in any channel ###
##############################################
# make one data frame
marker_all <- purrr::map_dfr(setNames(names(out[["marker"]]), names(out[["marker"]])),
function(x) out[["marker"]][[x]][["marker_table"]], .id = "clustering")
# sort by diptest p value; or low p indicates bi- or multimodality
marker_all <- dplyr::arrange(marker_all, diptest_p)
# see ?diptest::dip.test
# a low p-value indicates bi- or multimodality (multiple peaks)
# a high p-value (close to 1) indicates unimodality
# multimodality indicates heterogeneity within in the cluster
# and may justify to separate that cluster further into sub-clusters
# this depends on the interpretation of the scientist though
##############################################
### overlay gated populations from flowjo ####
### on dimension reduction plot (SOM/UMAP) ###
##############################################
common_cols <- Reduce(intersect, purrr::map(ff[["indices"]], colnames))
ff[["indices"]] <- purrr::map(ff[["indices"]], function(x) x[,which(colnames(x) %in% common_cols)])
ind_mat <- do.call(rbind, ff[["indices"]])
som <- ggplot(out[["df"]], aes(SOM_1, SOM_2, color = as.factor(cutree_30))) +
geom_point(size = 0.5) +
geom_density2d(data = . %>% dplyr::filter(ind_mat[,which(basename(colnames(ind_mat)) == "NK cells")]), color = "black",
contour_var = "ndensity", breaks = c(0.1, 0.2, 0.4, 0.6, 0.8)) +
guides(color = guide_legend(override.aes = list(size = 2)))
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.