dr_to_fcs: Calculate dimension reduction and cluster annotation with...
In Close-your-eyes/fcexpr: Complement the manual analysis of flow cytometric experiments and organize your fcs files in a consistent way

dr_to_fcs

R Documentation

Calculate dimension reduction and cluster annotation with data from one or more flow frames and add these parameters to a (concatenated) fcs file

Description

Prepare a list of flowframes (ff.list) with fcexpr::wsp_get_ff or fcexpr::inds_get_ff. The objects returned from these function will be compatible with fcexpr::dr_to_fcs. The ff.list must contain a list of flowframes with untransformed fluorescence intensities (FI) and optionally may contain and additional list of flowframes with transformed FI. The transformation is your choice and may be logicle, arcsinh or whatever. It may be individual to channels and/or to flowframes. You have to calculate the transformation outside this function (dr_to_fcs) though. If a list of transformed flowframes is provided this one will be used for dimension reduction and clustering etc. If only a list untransformed flowframes is provided, this one will be used. Both, the transformed and/or untransformed values will be written to the concatenated fcs file (see function arguments). Calculated dimension reductions and cluster annotations will also be written as channels into that fcs file. Some dimension reduction and cluster detection algorithm may become slow with a large number of events (e.g. > 1e6, see the details). In order to get a quick impression of what the algorithms can pull out for you, you may use the 'downsample' argument in fcexpr::wsp_get_ff or fcexpr::inds_get_ff to conveniently sample a random subset of events from each fcs files (flowframe).

Usage

dr_to_fcs(
  ff.list,
  channels = NULL,
  add.sample.info = NULL,
  scale.whole = c("z.score", "min.max", "none"),
  scale.samples = c("none", "z.score", "min.max"),
  run.harmony = F,
  run.pca = F,
  run.lda = F,
  run.umap = T,
  run.tsne = F,
  run.som = T,
  run.gqtsom = F,
  metaclustering.on = NULL,
  run.louvain = F,
  run.kmeans = F,
  run.minibatchkmeans = F,
  run.kmeans_arma = F,
  run.kmeans_rcpp = F,
  run.leiden = F,
  run.hclust = F,
  run.flowClust = F,
  run.MUDAN = F,
  n.pca.dims = 0,
  clustering.for.marker.calc = NULL,
  calc.global.markers = F,
  calc.pairwise.markers = F,
  extra.cols = NULL,
  mc.cores = 1,
  save.to.disk = c("fcs", "rds"),
  save.path = file.path(getwd(), paste0(substr(gsub("\\.", "",
    make.names(as.character(Sys.time()))), 2, 15), "_FCS_dr")),
  exclude.extra.channels = ifelse(length(ff.list) == 1 && names(ff.list) ==
    "transformed", "cluster.id", "FSC|SSC|Time|cluster.id"),
  write.transformed.channels.to.FCS = T,
  write.untransformed.channels.to.FCS = T,
  write.scaled.channels.to.FCS = F,
  timeChannel = "Time",
  transformation_name = "trans",
  return_processed_raw_data_only = F,
  seed = 42,
  ...
)

Arguments

`ff.list`	a list of flowFrames as received from fcexpr::wsp_get_ff (compensated with Compensation Matrix as defined in FlowJo by default) or as received from fcexpr::inds_get_ff (directly from FCS files, not compensated by default)
`channels`	a named vector of channels to use for dimension reduction. values can be channel names (v-450LP..., b-520..., or so), or channel descriptions (e.g. CD3 or LAG3-PeCy7 for example) names will be used as new description in the fcs file to be written; if no names provided, names of the very first FCS file will be used
`add.sample.info`	named list of additional channels to identify samples or group them in flowjo; e.g. for 9 fcs files to be concatenated: add.sample.info = list(condition = c(1,2,3,1,2,3,1,2,3), donor = c(1,1,1,2,2,2,3,3,3))
`scale.whole`	if and how to scale channels after concatenation of flowframes in ff.list
`scale.samples`	if and how to scale channels of flowframes in ff.list individually before concatenation
`run.harmony`	attempt batch correction using harmony::HarmonyMatrix; if TRUE, then harmony__meta_data has to be provided in ... indicating the groups to be batch corrected; harmony is conducted before run.pca; to calculate a pca before harmony, pass harmony__do_pca = T and optional a number of pcs with harmony__npcs. Set run.pca = F when a pca is before in harmony.
`run.pca`	run principle component analysis only, or prior to other dimension reduction (umap, SOM, tSNE, ...)
`run.lda`	run Linear Discriminant Analysis before dimension reduction; should be F (FALSE) or a clustering calculated before, e.g. louvain_0.8 or leiden_1.1, kmeans_12 etc.; respective clustering calculation has to be provided in ...
`run.umap`	logical, whether to calculate UMAP dimension reduction with uwot::umap
`run.tsne`	logical, whether to calculate tsne dimension reduction with Rtsne::Rtsne
`run.som`	logical, whether to calculate SOM dimension reduction EmbedSOM::SOM followed by EmbedSOM::EmbedSOM
`run.gqtsom`	logical, whether to calculate GQTSOM dimension reduction EmbedSOM::GQTSOM followed by EmbedSOM::EmbedSOM
`metaclustering.on`	SOM or GQTSOM; run clustering-algorithms not on original data but on SOM map (so called metaclustering); SOM or GQTSOM may be selected for metaclustering; this is expected to yield a substantial improvement to calculation speed on large data sets; if NULL, clustering is performed on original data; recommendation: when applying meta-clustering choose run.hclust = T and supply hclust__method = "average" and cutree__k = c(xx) with xx being the number of desired/expected clusters
`run.louvain`	logical, whether to detect clusters (communities, groups) of cells with the louvain algorithm, implemented in Seurat::FindClusters (subsequent to snn detection by Seurat::FindNeighbors)
`run.kmeans`	logical, whether to detect clusters with stats::kmeans
`run.minibatchkmeans`	logical, whether to detect clusters with ClusterR::MiniBatchKmeans
`run.kmeans_arma`	logical, whether to detect clusters with ClusterR::KMeans_arma
`run.kmeans_rcpp`	logical, whether to detect clusters with ClusterR::KMeans_rcpp
`run.leiden`	logical, whether to detect clusters (communities, groups) of cells with the leiden algorithm, with leiden::leiden (subsequent to snn detection by Seurat::FindNeighbors)
`run.hclust`	logical, whether to detect clusters with stats::dist, stats::hclust and stats::cutree
`run.flowClust`	logical, whether to detect clusters with flowClust::flowClust
`run.MUDAN`	logical, whether to detect clusters with MUDAN::getComMembership
`n.pca.dims`	number of principle components to calculate
`clustering.for.marker.calc`	if NULL nothing is calculated; otherwise marker features (stained markers) are determined by wilcox test using presto::wilcoxauc for the provided clustering(s). each cluster is tested against other events and clusters are compared pairwise. respective clustering calculation has to be provided in ...; e.g. if louvain__resolution = 0.5 is provided set clustering.for.marker.calc = louvain_0.5; and if in addition leiden__resolution_parameter = 0.7 then set clustering.for.marker.calc = c(louvain_0.5, leiden_0.7).
`calc.global.markers`	logical whether to calculate global cluster markers: so each cluster vs. all other cells
`calc.pairwise.markers`	logical whether to calculate pairwise cluster markers: so each cluster vs. each cluster
`extra.cols`	numeric vector of an extra column (or matrix of multiple columns) with arbitraty information to add to the final fcs file; has to be equal to the number of rows in all flowframes provided; colnames will be the channel names in the FCS file; could be a previously calculated dimension reduction or cluster annotation.
`mc.cores`	number of cores to use for parallel computing of cluster markers and multiple cluster resolutions; mc.cores is passed to parallel::mclapply or parallel::mcmapply; limited to parallel::detectCores()-1
`save.to.disk`	what to save to disk: (concatenated) and appended FCS file and/or rds file with several elements in a list
`save.path`	where to save elements specified in save.to.disk; set to NULL to have nothing written to disk
`exclude.extra.channels`	when scaled and transform channels are written to FCS file, some channels may be redundant and will only occupy disk space, those are specified here; matched with grepl
`write.transformed.channels.to.FCS`	do save transformed channels (e.g. logicle or arcsinh) to FCS file
`write.untransformed.channels.to.FCS`	do save untransformed channels (inverse) to FCS file
`write.scaled.channels.to.FCS`	do save scaled channels (scale.whole, scale.samples) to FCS file
`timeChannel`	name of the Time channel to exclude from all analyses and calculation; if NULL will be attempted to be detected automatically
`transformation_name`	name of the applied transformation (will appear in FCS file as channel desc)
`return_processed_raw_data_only`	do not calculate dimension reduction etc but only return the preprocessed data for external calculations or tryouts
`seed`	set a seed for reproduction of dimension reductions
`...`	additional parameters to calculations of UMAP, tSNE, SOM, GQTSOM, EmbedSOM, louvain, leiden, harmony, flowClust, hclust, MUDAN, kmeans; provide arguments as follows: UMAP__n_neighbors = c(15,20,25), or tsne__theta = 0.3, etc. see respected help files to get to know which arguments can be passed: uwot::umap, Rtsne::Rtsne, EmbedSOM::SOM, EmbedSOM::GQTSOM, EmbedSOM::EmbedSOM, harmony::HarmonyMatrix, flowClust::flowClust, louvain: Seurat::FindNeighbors and Seurat::FindCluster, leiden: Seurat::FindNeighbors and leiden::leiden. hclust: stats::dist and stats::hclust, MUDAN: MUDAN::getComMembership, stats::kmeans, ClusterR::MiniBatchKmeans, ClusterR::KMeans_rcpp, ClusterR::KMeans_arma

Details

Logicle transformation of FI has been described here: Parks, et al., 2006, PMID: 16604519 DOI: 10.1002/cyto.a.20258. Different transformations have been compared for instance here: Transformations. Dimension reduction (low dimension embedding) algorithms are: UMAP, tSNE, SOM, GQTSOM, PCA. tSNE is expected to be too slow for more then 1e4 or 1e5 cells (depending precision set by tsne__theta). UMAP has been developed more recently, is well-know and quicker. SOM and GQTSOM are very quick and are similar to flowSOM but are much more convenient to use in programming as they accept matrices whereas flowSOM strictly requires flowFrames. SOMs also enable the so-called meta-clustering which massively speeds up cluster detection. The cost of preciseness has to be inspected manually. PCA will be orders of magnitude faster than UMAP especially on large numbers of events (1e6 and above). On the other hand it will not produce as nicely separated clusters as it is a linear algorithm. Cluster (community) detection algorithms are louvain, leiden, kmeans, hclust, flowClust and MUDAN. They assign events with similar properties in the high dimensional space a common discrete label. kmeans will be the quickest. Set metaclustering.on to 'SOM' or 'GQTSOM' to enable quick metaclustering. In this any of the clustering algorithms is performed on the SOM-map. Louvain is slower but may yield better results on original data (not meta-clustering). For a low number of events (e.g. below 1e6) louvain will perform in a reasonable amount of time and could used immediately in parallel to kmeans. leiden will require the original python library and is approximately similar to louvain with respect to calculation time, maybe a bit slower. leiden is an enhancement of louvain, though louvain does not produce "wrong" results per se. flowClust, MUDAN, and hclust have not been tested extensively. For kmeans, hclust, flowClust the number of expected cluster can be provided. louvain and leiden take a resolution parameter (lower resolution –> less clusters). MUDAN takes the a number of nearest neighbors (less neighbors –> more clusters). An initial PCA may not be required if the selected channels are chosen thoughtfully (only those with stained markers). If you decide though to simply use all channels (with and and without stained marker) for dimension reduction, PCA will automatically lead to focus on the ones with highest variation (so those channels which contain staining on a subset of events). In this case you may set n.pca.dims roughly equal to the number of channels which contain a staining.

Value

A list with 3 elements: (i) The matrix of fluorescence intensities and appended information (dim red, clustering). This is the table which is written into a newly generated fcs file. (ii) A character vector of meaningful column names which may be used for the table in R (rather for convenience). (iii) Tables of marker features (each cluster vs all other events and all clusters pairwise).

Examples

## Not run: 
############################
### Plot cluster markers ###
############################
dr <- fcexpr::dr_to_fcs(ff.list = ffs,
channels = channels,
louvain__resolution = 0.5,
run.lda = "louvain_0.5",
clustering.for.marker.calc = c("louvain_0.5"),
save.path = NULL)
marker <- dr[[3]][[1]][[1]]
marker$channel_desc2 <- sapply(strsplit(marker$channel_desc, "_"), "[", 1)
marker <-
 marker %>%
 dplyr::mutate(pvalue = ifelse(pvalue == 0, 1e-300, marker$pvalue)) %>%
 dplyr::group_by(channel_desc2) %>%
 dplyr::mutate(mean_scaled = fcexpr:::min.max.normalization(mean))

ggplot(marker, aes(x = as.factor(cluster), y = channel_desc2, fill = -log10(pvalue))) +
 geom_tile(color = "black") +
 theme_bw() +
 geom_text(aes(label = diff_sign)) +
 scale_fill_viridis_c()

ggplot(marker, aes(x = as.factor(cluster), y = channel_desc2, fill = mean_diff)) +
 geom_tile(color = "black") +
 theme_bw() +
 geom_text(aes(label = diff_sign)) +
 scale_fill_viridis_c()

ggplot(marker, aes(x = as.factor(cluster), y = channel_desc2, fill = mean_scaled)) +
 geom_tile(color = "black") +
 theme_bw() +
 scale_fill_viridis_c()

ggplot(marker, aes(x = mean_diff, y = -log10(pvalue), label = channel_desc2)) + #color = channel_desc2,
 geom_point() +
 theme_bw() +
 labs(title = "cluster markers (vs all other cells each)") +
 ggrepel::geom_text_repel(max.overlaps = 20, show.legend = F) +
 theme(panel.grid.minor.x = element_blank(), panel.grid.minor.y = element_blank(), panel.grid.major.x = element_blank(), strip.background = element_rect(fill = "hotpink2")) +
 geom_vline(xintercept = 0, col = "tomato2", linetype = "dashed") +
 geom_hline(yintercept = 100, col = "tomato2", linetype = "dashed") +
 facet_wrap(vars(cluster))

 ##############################################
 ### find clusters which may be subject #######
 ### to bi- or multimodality in any channel ###
 ##############################################
 # make one data frame
 marker_all <- purrr::map_dfr(setNames(names(out[["marker"]]), names(out[["marker"]])),
 function(x) out[["marker"]][[x]][["marker_table"]], .id = "clustering")

 # sort by diptest p value; or low p indicates bi- or multimodality
 marker_all <- dplyr::arrange(marker_all, diptest_p)
 # see ?diptest::dip.test
 # a low p-value indicates bi- or multimodality (multiple peaks)
 # a high p-value (close to 1) indicates unimodality
 # multimodality indicates heterogeneity within in the cluster
 # and may justify to separate that cluster further into sub-clusters
 # this depends on the interpretation of the scientist though


 ##############################################
 ### overlay gated populations from flowjo ####
 ### on dimension reduction plot (SOM/UMAP) ###
 ##############################################

common_cols <- Reduce(intersect, purrr::map(ff[["indices"]], colnames))
ff[["indices"]] <- purrr::map(ff[["indices"]], function(x) x[,which(colnames(x) %in% common_cols)])
ind_mat <- do.call(rbind, ff[["indices"]])

som <- ggplot(out[["df"]], aes(SOM_1, SOM_2, color = as.factor(cutree_30))) +
geom_point(size = 0.5) +
geom_density2d(data = . %>% dplyr::filter(ind_mat[,which(basename(colnames(ind_mat)) == "NK cells")]), color = "black",
contour_var = "ndensity", breaks = c(0.1, 0.2, 0.4, 0.6, 0.8)) +
guides(color = guide_legend(override.aes = list(size = 2)))


## End(Not run)

Close-your-eyes/fcexpr documentation built on Sept. 29, 2023, 12:27 a.m.

Close-your-eyes/fcexpr index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Close-your-eyes/fcexpr
Complement the manual analysis of flow cytometric experiments and organize your fcs files in a consistent way

dr_to_fcs: Calculate dimension reduction and cluster annotation with...
In Close-your-eyes/fcexpr: Complement the manual analysis of flow cytometric experiments and organize your fcs files in a consistent way

Calculate dimension reduction and cluster annotation with data from one or more flow frames and add these parameters to a (concatenated) fcs file

Description

Usage

Arguments

Details

Value

Examples

Related to dr_to_fcs in Close-your-eyes/fcexpr...

R Package Documentation

Browse R Packages

We want your feedback!

Close-your-eyes/fcexpr Complement the manual analysis of flow cytometric experiments and organize your fcs files in a consistent way

dr_to_fcs: Calculate dimension reduction and cluster annotation with... In Close-your-eyes/fcexpr: Complement the manual analysis of flow cytometric experiments and organize your fcs files in a consistent way

Calculate dimension reduction and cluster annotation with data from one or more flow frames and add these parameters to a (concatenated) fcs file

Description

Usage

Arguments

Details

Value

Examples

Related to dr_to_fcs in Close-your-eyes/fcexpr...

R Package Documentation

Browse R Packages

We want your feedback!

Close-your-eyes/fcexpr
Complement the manual analysis of flow cytometric experiments and organize your fcs files in a consistent way

dr_to_fcs: Calculate dimension reduction and cluster annotation with...
In Close-your-eyes/fcexpr: Complement the manual analysis of flow cytometric experiments and organize your fcs files in a consistent way