The rcell2
package has a bioconductor dependency, which must be installed first.
Install the EBImage
package (required to look at cells) by copying and running the following script:
if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager") if (!requireNamespace("EBImage", quietly = TRUE)) BiocManager::install("EBImage")
The rcell2
package can be installed directly from its git repository by running the following:
if (!requireNamespace("remotes", quietly = TRUE)) install.packages("remotes") if (!requireNamespace("rcell2", quietly = TRUE)) remotes::install_github("darksideoftheshmoo/rcell2")
You may download a fresh copy of this file with rcell2::get_workflow_template()
.
knitr::opts_chunk$set(echo = T, message = F, # https://yihui.org/knitr/options/#chunk-options out.width = "100%") # https://stackoverflow.com/a/66753995 # tidy.opts = list(width.cutoff = 60), tidy = TRUE) knitr::opts_knit$set(root.dir = here::here()) #, base.dir = here::here()) library(rcell2) # this package you are checking out :) # library(rcell2.cellid) # Advanced functionality: run CellID entirely in R. # library(rcell2.magick) # Advanced functionality: filter data using Shiny and preview images with Magick. # library(magick) # if you need it library(tidyverse) # import numpy as np (?)
?
(e.g. ?load_cell_data
).rcell2.cellid
package. It contains several routine checks that should not be skipped.Images are assumed to be in a single directory, and have names with identifiers for:
Cell-ID uses the first 3 letters of the file name to group imaging channels.
This is mandatory. If your image set does not conform initially, it can be
renamed (or symlinked) using rcell2's rename_mda
, with custom arguments.
The rcell2
packages rely on regular expressions to gather images and extract
metadata from their file names (i.e. channel, position, and time frame.)
File names for a 2-position time course experiment are expected to have the following appearance:
BF_Position001_time01.tif BF_Position001_time02.tif TFP_Position001_time01.tif TFP_Position001_time02.tif YFP_Position001_time01.tif YFP_Position001_time02.tif BF_Position002_time01.tif BF_Position002_time02.tif TFP_Position002_time01.tif TFP_Position002_time02.tif YFP_Position002_time01.tif YFP_Position002_time02.tif
The file list will be recognized by a specific regular expression (regex).
A regex matching these files can be ^(BF|TFP)_Position(\\d+)_time(\\d+).tif$
.
Note that the regex has 3 capturing groups (i.e. the patterns between parentheses). The regex as a whole is used to select images from a directory (filtering out other files), and the capturing groups are used to extract the metadata.
You may need to adjust these if your files are named differently.
Set the data_dir
variable to the path of your segmentation output:
data_dir <- "/.../path/to/your/segmeted/images"
A full explanation and step-by-step guide of the image segmentation process is available
in the notebook template of the rcell2.cellid
package.
Un-comment and execute the following to install the rcell2.cellid
package and open the template:
# if(!requireNamespace("rcell2.cellid")) remotes::install_github("darksideoftheshmoo/rcell2-cellid") # rcell2.cellid::get_workflow_template_cellid()
Un-comment and execute the following to install the rcell2.examples
package and set the data directory:
# if(!requireNamespace("remotes")) install.packages("remotes") # if(!requireNamespace("rcell2.examples")) remotes::install_github("darksideoftheshmoo/rcell2.examples") # data_dir <- system.file("extdata/sample_datasets/sample_time_series/", package = "rcell2.examples")
The example data has already been processed with rcell2.cellid
, and can be loaded with rcell2
, as shown below.
Cell-ID's output consists of segmented images, and data files in plain text format. These must be loaded into R for further analyses.
This is done with the load_cell_data
function:
cell.data <- rcell2::load_cell_data(path = data_dir)
Alternatively, there is also the get_cell_data
function from the rcell2.cellid
package:
cell.data <- rcell2.cellid::get_cell_data(path = data_dir)
Note: your
fluorescence.pattern
regex may vary, see?get_cell_data
.
If Cell-ID was configured to output the segmentation mask, either to a TSV
or masked TIFF file, the information can be loaded with the cell.load.boundaries
function:
cell.load.boundaries(data.source = "masks.tsv")
Descriptions of the variables in cdata
are available in the rcell2.cellid
package:
rcell2.cellid::cellid_output_descriptions(F)
The cell.data object is a list. The main dataset is stored in the data
item, and
the paths to image files are in the images
item.
We usually assign these to their own objects, for easier manipulation:
cdata <- cell.data$data images <- cell.data$images
Warning:
load_cell_data
does the following automatically if a pdata.csv file is found next to the images. If it were the case, you musnt't run the join again.
Load "pdata":
# Set the path to the file (modify accodringly). pdata_path <- ".../somewhere/over/your/pdata.csv" # It is a "metadata" dataframe we usually join to cdata by "position" and/or "t.frame", # which contains other experimental variables: simuli concentrations, notes, etc. pdata <- readr::read_csv(pdata_path) pdata
Note: the "p" in "pdata" stands for microscope position data.
This table is usually joined to cdata by position (and time frame if available).
cdata <- left_join(cdata, pdata)
No analysis package exists (yet) that can replace a researcher's general visual assesment.
The tagCell
function from the `rcell2.magick`` can provide a randomized experimental context
for tagging (i.e. a pseudo double-blind experiment).
If bias during tagging is not a concern, the "multi-point" tool in ImageJ can be used to mark cells with different "counters", by simply clicking on them.
These marks can be exported and mapped back to ucid
s in R, and then
used for analysis.
This section helps you open your images with FIJI-ImageJ before you begin.
A simple Macro can be run in ImageJ to load virtual hyperstacks of any experiment.
Note: this macro uses a "file" pattern that only matches input images (and not ".out") in a tipically-named set of images. Customize the pattern to your need.
Steps:
rcell2.cellid::ijm_open_hyperstack(images)
Note: the files are symlinked with new names to a temporary directory. Otherwise, the names of the ImageJ dimensions may be mismatched between R and the macro. This is because virtual stacks must be in
xyczt
order (time is likely to end up as the "channels" dimension), but the order in which the "Image Sequence..." is loaded depends on how their files are named.
The only restriction of this approach is that only "3" dimensions are available for
browsing: frame
, plane
(Slice), and channel
. This is relevant if you wish to browse
segmentation results (which adds another dimension to the set: original vs "out").
A fourth dimension would be needed, but Hyperstacks only provide 3. Alternatively,
open another virtual hyperstack for the output images, and browse them in-sync.
Steps (continued): follow these steps to tag cells in ImageJ.
.csv
file.t.frame
s,
pos
, and ch
in Cell-IDs output.t.frame
s.Save the files to a single directory, and then save the path here:
results.dir <- ".../somewhere/over/the/measurements/"
We usually prepare several CSV files, and extract the position index
from the file name or the Slice
variable in them.
For example, CSV files can be named pos01.csv
. In this case, prepare a matching
regex, that will find only the CSV files you want to use. For example:
results.pattern <- "pos(\\d+).csv"
You can now list the CSV files:
mp.filters.files <- dir(path = results.dir, pattern = results.pattern, full.names = T) mp.filters.files
Check that these are the files you need.
Now, load the measurements:
mp.filters <- # Load CSV files mp.filters.files |> lapply(read_csv) |> # Bind rows bind_rows() |> # Get position info from the "Slice" index.. mutate(pos = Slice) |> # Or, get position info from the file name, if applicable. # mutate(file=mp.filters.files) |> # extract(file, results.pattern, into = "pos", convert = T) |> # Cleanup select(pos, X, Y, Ch, Slice, Frame) |> # Add an ID to each tag mutate(mpid = 1:n())
mp.filters <- mp.filters |> # Re-index t.frame mutate(t.frame = Frame - 1)
mp.filters
data.frame
with tag names mapped to Counter indices.# For example: tag_counter <- 0:3 tag_descriptions <- c("Ok", "Budding", "Dead", "Spurious") counter.desc.df <- data.frame( Counter = tag_counter, tag_description = tag_descriptions ) counter.desc.df
Here is an example mp.filters
data.frame
. In this case the position
is encoded in the Slice variable.
mp.filters.file <- system.file("extdata/sample_datasets/sample_time_series/imagej_mp_tags/results_pos4_12.csv", package = "rcell2.examples", mustWork = T) multipoints <- # Load CSV files mp.filters.file |> read_csv() |> # Cleanup select(X, Y, Ch, Slice, Frame, Counter) |> # Add an ID to each tag mutate(mpid = 1:n()) mp.filters <- multipoints |> left_join(slice.pos.df) |> left_join(ch.frame.df) |> left_join(counter.desc.df) mp.filters
Keep the relevant set of counters (a.k.a. markers). This code will work with one marker at a time.
mp.filter <- mp.filters |> # Here you may choose which markers to map. # filter(Counter == 0) |> select(mpid, X, Y, pos, t.frame) mp.filter
Calculate distance to closest multipoint and save it's ID:
# Make a copy of cdata cdata.mp <- cdata |> # filter(t.frame == min(t.frame)) |> select(ucid, pos, t.frame, xpos, ypos) # Make a copy of the points mp.mapped <- mp.filter # Add the columns mp.mapped[, c("ucid", "closest.dist")] <- apply(mp.filter, 1, function(x){ cells <- filter(cdata.mp, pos == x["pos"] & t.frame == x["t.frame"]) if(nrow(cells) == 0) return(c(NA, NA)) dists <- sqrt(( (x["X"]-cells$xpos)^2) + ((x["Y"]-cells$ypos)^2) ) if(length(dists) == 0) return(c(NA, NA)) min.id <- cells$ucid[which.min(dists)] c( min.id, min(dists)) }) |> t() mp.mapped
Fix mis-assgnments manually here:
If you spot any mis-assignments later, you can fix them here mannualy and then re-inspect.
# mp.mapped[mp.mapped$ucid==110047,"ucid"] <- 110754 # # mp.mapped
Join the points to cdata.mp
:
mp.mapped.minimal <- mp.mapped |> select(mpid, ucid, t.frame, closest.dist, Counter) cdata.mp <- cdata.mp |> inner_join(mp.mapped.minimal) |> dplyr::rename(closest.mp.id=mpid)
Plot the point's distribution, this will help you choose an appropriate threshold.
mp.mapped.minimal |> ggplot(aes(closest.dist)) + geom_histogram() + facet_wrap(~Counter)
Set a threshold distance, and apply the filter:
This value will depend on you opticalmicroscopy setup, and the morphology of your cells.
threshold <- 11 cdata.mp$has.close.mp <- cdata.mp$closest.dist < threshold
Plot result:
You may find
plotly::ggplotly
useful to render this plot in interactive form.
p <- ggplot() + geom_point(aes(xpos, ypos, color="CellID", text=paste("ucid:", ucid, ", closest dist:", closest.dist)), data=cdata.mp) + geom_point(aes(X, Y, color="ImageJ"), data = mp.filter, size = 3) + geom_point(aes(xpos, ypos, color="matched"), data=cdata.mp |> filter(has.close.mp)) + facet_wrap(~pos) + coord_equal() + ggtitle(paste0("Examine MP filter result", " (d<", threshold, ")"), "Black: ImageJ points; Gray: all Cell-ID cells; Red: Cell-ID cells matched to a point.") + scale_color_manual(values=c(ImageJ="black", matched="red", CellID="grey")) + scale_y_reverse() + theme_minimal() # plotly::ggplotly(p) p
Examine images of the tagged cells:
cdata.mp |> filter(has.close.mp) |> rcell2.magick::magickCell(images) |> rcell2.magick::magickForKnitr()
Examine images of the un-matched points:
# Make a cdata-like data frame for magickCell. unmatched.points <- mp.filters |> dplyr::rename(xpos=X,ypos=Y) |> inner_join( mp.mapped.minimal |> filter(closest.dist >= threshold) |> select(-t.frame), by = "mpid" ) # Display some images. unmatched.points |> rcell2.magick::magickCell(images) |> rcell2.magick::magickForKnitr()
Generate indicator column:
cdata.mp$n_dupes <- apply(X = cdata.mp, MARGIN = 1, function(cell){ sum( (cell["closest.mp.id"] == cdata.mp$closest.mp.id) & cdata.mp$has.close.mp ) }) cdata.mp$dupes <- cdata.mp$n_dupes > 1 any(cdata.mp$dupes) # Should be false unique(cdata.mp$n_dupes) # Should be only 1's and 0's.
n_dupes
is 1 if theucid
was assigned to 1 mpid. If greater than 1, theucid
shares the mpid with at least anoherucid
(and this should not happen).
Check graphically:
plt.dupes <- ggplot() + geom_point(aes(X, Y), color = "gray", data = mp.filter |> filter(Counter == 0), size =3) + geom_point(aes(xpos, ypos), color = "black", data = cdata.mp) + geom_point(aes(xpos, ypos, color=as.factor(n_dupes)), data = cdata.mp |> filter(has.close.mp)) + geom_line(aes(xpos, ypos, group=closest.mp.id), color = "black", data = cdata.mp |> filter(has.close.mp)) + ggtitle("Graphical check of unique matches", "The points from cdata sharing a close multipoint are joined by lines") + facet_wrap(~pos) plotly::ggplotly(plt.dupes)
Tabular check:f for each closest.mp.drop.id
there should be only one row.
checks.df <- cdata.mp |> group_by(closest.mp.id) |> filter(has.close.mp) |> summarise(row.count = n()) |> arrange(-row.count) all(checks.df$row.count == 1) # Should be TRUE
Manual inspection:
checks.df
Use the ImageJ points to select cells in cdata.
Generate a mapping data.frame
:
mapping <- cdata.mp |> filter(has.close.mp) |> select(ucid, closest.mp.id)
Last check for unicity:
stopifnot(nrow(mapping) == nrow(unique(mapping)))
Examine:
mapping
You can use this information to filter cdata
, for example by ucid
:
cdata.filtered <- cdata |> filter(ucid %in% mapping$ucid)
Done! cdata.filtered
now contains only cells that were matched to a 0
Counter
in ImageJ. Other uses for this include tagging events, or phenotypes of interest.
Optionally use rcell2.magick
to display some images:
cdata.filtered |> rcell2.magick::magickCell(images) |> rcell2.magick::magickForKnitr()
Repeat these steps for the other Multi-point Counters.
Extra tools for graphical filtering and annotation/tagging are available in
the rcell2.magick
package.
That package also includes functions for generating and manipulating images of single cells, and preparing strips, tiles, and plots.
To learn more, install the rcell2.magick
package and open it's template notebook.
Uncomment and execute the following to install the package and open the template:
# if(!requireNamespace("rcell2.magick")) remotes::install_github("darksideoftheshmoo/rcell2-magick") # rcell2.magick::get_workflow_template_magick()
The Hu Moments can be computed from the masked TIFF files, generated by Cell-ID with
the appropriate options (see encode_cellID_in_pixels
and fill_interior_pixels
in
the help page of rcell2.cellid::cell2
).
To append the Hu Moments to a cell data object, use the append_hues
function:
cell_data <- rcell2::append_hues(cell_data = cell_data, return_points = T, image_bits = 16) rcell2:::check_tiff_mask(cell_data)
Alternatively, the Hu Moments can be computed from the TSV files (see output_coords_to_tsv
at the help page of rcell2.cellid::cell2
), by using the append_hues2
.
Read its help page for more information and usage:
?append_hues2
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.