Installation

The rcell2 package has a bioconductor dependency, which must be installed first.

Install the EBImage package (required to look at cells) by copying and running the following script:

if (!requireNamespace("BiocManager", quietly = TRUE))
  install.packages("BiocManager")

if (!requireNamespace("EBImage", quietly = TRUE))
  BiocManager::install("EBImage")

The rcell2 package can be installed directly from its git repository by running the following:

if (!requireNamespace("remotes", quietly = TRUE))
  install.packages("remotes")

if (!requireNamespace("rcell2", quietly = TRUE))
  remotes::install_github("darksideoftheshmoo/rcell2")

You may download a fresh copy of this file with rcell2::get_workflow_template().

Notebook setup

knitr::opts_chunk$set(echo = T,
                      message = F, 
                      # https://yihui.org/knitr/options/#chunk-options
                      out.width = "100%")
                      # https://stackoverflow.com/a/66753995
                      # tidy.opts = list(width.cutoff = 60), tidy = TRUE)
knitr::opts_knit$set(root.dir = here::here())  #, base.dir = here::here())

library(rcell2)  # this package you are checking out :)
# library(rcell2.cellid)  # Advanced functionality: run CellID entirely in R.
# library(rcell2.magick)  # Advanced functionality: filter data using Shiny and preview images with Magick.

# library(magick)  # if you need it

library(tidyverse)  # import numpy as np (?)

Preamble

Friendly reminders

Image file names

Images are assumed to be in a single directory, and have names with identifiers for:

Cell-ID uses the first 3 letters of the file name to group imaging channels. This is mandatory. If your image set does not conform initially, it can be renamed (or symlinked) using rcell2's rename_mda, with custom arguments.

The rcell2 packages rely on regular expressions to gather images and extract metadata from their file names (i.e. channel, position, and time frame.)

File names for a 2-position time course experiment are expected to have the following appearance:

BF_Position001_time01.tif
BF_Position001_time02.tif
TFP_Position001_time01.tif
TFP_Position001_time02.tif
YFP_Position001_time01.tif
YFP_Position001_time02.tif
BF_Position002_time01.tif
BF_Position002_time02.tif
TFP_Position002_time01.tif
TFP_Position002_time02.tif
YFP_Position002_time01.tif
YFP_Position002_time02.tif

The file list will be recognized by a specific regular expression (regex). A regex matching these files can be ^(BF|TFP)_Position(\\d+)_time(\\d+).tif$.

Note that the regex has 3 capturing groups (i.e. the patterns between parentheses). The regex as a whole is used to select images from a directory (filtering out other files), and the capturing groups are used to extract the metadata.

You may need to adjust these if your files are named differently.

Data sources

Set the data_dir variable to the path of your segmentation output:

data_dir <- "/.../path/to/your/segmeted/images"

A full explanation and step-by-step guide of the image segmentation process is available in the notebook template of the rcell2.cellid package.

Un-comment and execute the following to install the rcell2.cellid package and open the template:

# if(!requireNamespace("rcell2.cellid")) remotes::install_github("darksideoftheshmoo/rcell2-cellid")

# rcell2.cellid::get_workflow_template_cellid()

Un-comment and execute the following to install the rcell2.examples package and set the data directory:

# if(!requireNamespace("remotes")) install.packages("remotes")
# if(!requireNamespace("rcell2.examples")) remotes::install_github("darksideoftheshmoo/rcell2.examples")

# data_dir <- system.file("extdata/sample_datasets/sample_time_series/", package = "rcell2.examples")

The example data has already been processed with rcell2.cellid, and can be loaded with rcell2, as shown below.

Load Cell-ID output

Cell-ID's output consists of segmented images, and data files in plain text format. These must be loaded into R for further analyses.

This is done with the load_cell_data function:

cell.data <- rcell2::load_cell_data(path = data_dir)

Alternatively, there is also the get_cell_data function from the rcell2.cellid package:

cell.data <- rcell2.cellid::get_cell_data(path = data_dir)

Note: your fluorescence.pattern regex may vary, see ?get_cell_data.

If Cell-ID was configured to output the segmentation mask, either to a TSV or masked TIFF file, the information can be loaded with the cell.load.boundaries function:

cell.load.boundaries(data.source = "masks.tsv")

Cell-ID variables

Descriptions of the variables in cdata are available in the rcell2.cellid package:

rcell2.cellid::cellid_output_descriptions(F)

Data anlysis

The cell.data object is a list. The main dataset is stored in the data item, and the paths to image files are in the images item.

We usually assign these to their own objects, for easier manipulation:

cdata <- cell.data$data
images <- cell.data$images

Add stage position metadata

Warning: load_cell_data does the following automatically if a pdata.csv file is found next to the images. If it were the case, you musnt't run the join again.

Load "pdata":

# Set the path to the file (modify accodringly).
pdata_path <- ".../somewhere/over/your/pdata.csv"

# It is a "metadata" dataframe we usually join to cdata by "position" and/or "t.frame",
# which contains other experimental variables: simuli concentrations, notes, etc.
pdata <- readr::read_csv(pdata_path)

pdata

Note: the "p" in "pdata" stands for microscope position data.

This table is usually joined to cdata by position (and time frame if available).

cdata <- left_join(cdata, pdata)

Cell-tagging using ImageJ/FIJI

No analysis package exists (yet) that can replace a researcher's general visual assesment.

The tagCell function from the `rcell2.magick`` can provide a randomized experimental context for tagging (i.e. a pseudo double-blind experiment).

If bias during tagging is not a concern, the "multi-point" tool in ImageJ can be used to mark cells with different "counters", by simply clicking on them.

These marks can be exported and mapped back to ucids in R, and then used for analysis.

ImageJ Hyperstack macro

This section helps you open your images with FIJI-ImageJ before you begin.

A simple Macro can be run in ImageJ to load virtual hyperstacks of any experiment.

Note: this macro uses a "file" pattern that only matches input images (and not ".out") in a tipically-named set of images. Customize the pattern to your need.

Steps:

  1. Run the following code, copy its output to the clipboard.
  2. Open ImageJ/FIJI, and open a new "Macro" window, by clicking on "Plugins -> New -> Macro".
  3. Paste the macro into the macro editor, and click on "Run" below. A new hyperstack window will open.
  4. Browse the images using the sliders.
rcell2.cellid::ijm_open_hyperstack(images)

Note: the files are symlinked with new names to a temporary directory. Otherwise, the names of the ImageJ dimensions may be mismatched between R and the macro. This is because virtual stacks must be in xyczt order (time is likely to end up as the "channels" dimension), but the order in which the "Image Sequence..." is loaded depends on how their files are named.

The only restriction of this approach is that only "3" dimensions are available for browsing: frame, plane (Slice), and channel. This is relevant if you wish to browse segmentation results (which adds another dimension to the set: original vs "out"). A fourth dimension would be needed, but Hyperstacks only provide 3. Alternatively, open another virtual hyperstack for the output images, and browse them in-sync.

Filtering multipoints

Steps (continued): follow these steps to tag cells in ImageJ.

  1. Use the multi-point tool to mark individual cells with one or more "counters".
  2. After tagging the positions make measurements (press Ctrl+M) and save the output to a .csv file.
  3. It is critical that you check that the values of the "Ch" (channel), "Slice", and "Frame" ImageJ measurements map correctly to t.frames, pos, and ch in Cell-IDs output.
  4. Note that ImageJ's indexes begin at 1, but Cell-ID's begin at 0. This is specially relevant for t.frames.
  5. Load the Multi-point tags using the code below.

Save the files to a single directory, and then save the path here:

results.dir <- ".../somewhere/over/the/measurements/"

We usually prepare several CSV files, and extract the position index from the file name or the Slice variable in them.

For example, CSV files can be named pos01.csv. In this case, prepare a matching regex, that will find only the CSV files you want to use. For example:

results.pattern <- "pos(\\d+).csv"

You can now list the CSV files:

mp.filters.files <- dir(path = results.dir, 
                        pattern = results.pattern, 
                        full.names = T)

mp.filters.files

Check that these are the files you need.

Now, load the measurements:

mp.filters <- 
  # Load CSV files
  mp.filters.files |> lapply(read_csv) |> 

  # Bind rows
  bind_rows() |> 

  # Get position info from the "Slice" index..
  mutate(pos = Slice) |> 

  # Or, get position info from the file name, if applicable.
  # mutate(file=mp.filters.files) |> 
  # extract(file, results.pattern, into = "pos", convert = T) |> 

  # Cleanup
  select(pos, X, Y, Ch, Slice, Frame) |> 
  # Add an ID to each tag
  mutate(mpid = 1:n())
  1. Re-index the time frame column to match Cell-ID's.
mp.filters <- mp.filters |> 
  # Re-index t.frame
  mutate(t.frame = Frame - 1)
  1. Examine the result.
mp.filters
  1. Optional: Prepare a data.frame with tag names mapped to Counter indices.
  2. Note that ImageJ's Multi-point counters begin at 0.
  3. Each "Counter" has an associated "tag", corresponding to the event or class marked in the images.
# For example:
tag_counter <- 0:3
tag_descriptions <- c("Ok", "Budding", "Dead", "Spurious")

counter.desc.df <- data.frame(
  Counter = tag_counter,
  tag_description = tag_descriptions
)

counter.desc.df
Sample dataset

Here is an example mp.filters data.frame. In this case the position is encoded in the Slice variable.

mp.filters.file <- system.file("extdata/sample_datasets/sample_time_series/imagej_mp_tags/results_pos4_12.csv",
                               package = "rcell2.examples", mustWork = T)

multipoints <- 
  # Load CSV files
  mp.filters.file |> read_csv() |> 
  # Cleanup
  select(X, Y, Ch, Slice, Frame, Counter) |> 
  # Add an ID to each tag
  mutate(mpid = 1:n())

mp.filters <- multipoints |> 
  left_join(slice.pos.df) |> 
  left_join(ch.frame.df) |> 
  left_join(counter.desc.df)

mp.filters

Map points to cells

Keep the relevant set of counters (a.k.a. markers). This code will work with one marker at a time.

mp.filter <- mp.filters |>
  # Here you may choose which markers to map.
  # filter(Counter == 0) |>
  select(mpid, X, Y, pos, t.frame)

mp.filter

Calculate distance to closest multipoint and save it's ID:

# Make a copy of cdata
cdata.mp <- cdata |>
  # filter(t.frame == min(t.frame)) |> 
  select(ucid, pos, t.frame, xpos, ypos)

# Make a copy of the points
mp.mapped <- mp.filter

# Add the columns
mp.mapped[, c("ucid", "closest.dist")] <-
  apply(mp.filter, 1, function(x){
    cells <- filter(cdata.mp, pos == x["pos"] & t.frame == x["t.frame"])

    if(nrow(cells) == 0) return(c(NA, NA))

    dists <- sqrt(( (x["X"]-cells$xpos)^2) + ((x["Y"]-cells$ypos)^2) )

    if(length(dists) == 0) return(c(NA, NA))

    min.id <- cells$ucid[which.min(dists)]

    c( min.id, min(dists))
  }) |> t()

mp.mapped

Fix mis-assgnments manually here:

If you spot any mis-assignments later, you can fix them here mannualy and then re-inspect.

# mp.mapped[mp.mapped$ucid==110047,"ucid"] <- 110754
# 
# mp.mapped

Join the points to cdata.mp:

mp.mapped.minimal <- mp.mapped |> 
  select(mpid, ucid, t.frame, closest.dist, Counter)

cdata.mp <- cdata.mp |> inner_join(mp.mapped.minimal) |> 
  dplyr::rename(closest.mp.id=mpid)

Plot the point's distribution, this will help you choose an appropriate threshold.

mp.mapped.minimal |> 
  ggplot(aes(closest.dist)) + 
  geom_histogram() +
  facet_wrap(~Counter)

Set a threshold distance, and apply the filter:

This value will depend on you opticalmicroscopy setup, and the morphology of your cells.

threshold <- 11

cdata.mp$has.close.mp <- cdata.mp$closest.dist < threshold

Plot result:

You may find plotly::ggplotly useful to render this plot in interactive form.

p <- ggplot() + 
  geom_point(aes(xpos, ypos, 
                 color="CellID",
             text=paste("ucid:", ucid, ", closest dist:", closest.dist)),
             data=cdata.mp) +

  geom_point(aes(X, Y, color="ImageJ"), 
             data = mp.filter, 
             size = 3) +

  geom_point(aes(xpos, ypos, color="matched"),
             data=cdata.mp |> filter(has.close.mp)) +

  facet_wrap(~pos) + coord_equal() +
  ggtitle(paste0("Examine MP filter result", " (d<", threshold, ")"), 
          "Black: ImageJ points; Gray: all Cell-ID cells; Red: Cell-ID cells matched to a point.") + 

  scale_color_manual(values=c(ImageJ="black", matched="red", CellID="grey")) +
  scale_y_reverse() +
  theme_minimal()

# plotly::ggplotly(p)
p

Examine images of the tagged cells:

cdata.mp |> filter(has.close.mp) |> 
  rcell2.magick::magickCell(images) |> 
  rcell2.magick::magickForKnitr()

Examine images of the un-matched points:

# Make a cdata-like data frame for magickCell.
unmatched.points <- mp.filters |> 
  dplyr::rename(xpos=X,ypos=Y) |> 
  inner_join(
    mp.mapped.minimal |> 
      filter(closest.dist >= threshold) |> 
      select(-t.frame),
    by = "mpid"
  )

# Display some images.
unmatched.points |> 
  rcell2.magick::magickCell(images) |> 
  rcell2.magick::magickForKnitr()

Find duplicates

Generate indicator column:

cdata.mp$n_dupes <- apply(X = cdata.mp, MARGIN = 1, function(cell){
  sum( (cell["closest.mp.id"] == cdata.mp$closest.mp.id) & cdata.mp$has.close.mp )
})

cdata.mp$dupes <- cdata.mp$n_dupes > 1

any(cdata.mp$dupes) # Should be false
unique(cdata.mp$n_dupes) # Should be only 1's and 0's.

n_dupes is 1 if the ucid was assigned to 1 mpid. If greater than 1, the ucid shares the mpid with at least anoher ucid (and this should not happen).

Check graphically:

plt.dupes <- ggplot() + 

  geom_point(aes(X, Y), color = "gray",
             data = mp.filter |> filter(Counter == 0), 
             size =3) +

  geom_point(aes(xpos, ypos), color = "black", data = cdata.mp) +

  geom_point(aes(xpos, ypos, color=as.factor(n_dupes)),
             data = cdata.mp |> filter(has.close.mp)) + 

  geom_line(aes(xpos, ypos, group=closest.mp.id), color = "black",
            data = cdata.mp |> filter(has.close.mp)) + 

  ggtitle("Graphical check of unique matches",
          "The points from cdata sharing a close multipoint are joined by lines") +

  facet_wrap(~pos)

plotly::ggplotly(plt.dupes)

Tabular check:f for each closest.mp.drop.id there should be only one row.

checks.df <- cdata.mp |> 
  group_by(closest.mp.id) |> 
  filter(has.close.mp) |> 
  summarise(row.count = n()) |> 
  arrange(-row.count)

all(checks.df$row.count == 1)  # Should be TRUE

Manual inspection:

checks.df

Filter cdata

Use the ImageJ points to select cells in cdata.

Generate a mapping data.frame:

mapping <- cdata.mp |> filter(has.close.mp) |> 
  select(ucid, closest.mp.id)

Last check for unicity:

stopifnot(nrow(mapping) == nrow(unique(mapping)))

Examine:

mapping

You can use this information to filter cdata, for example by ucid:

cdata.filtered <- cdata |> 
  filter(ucid %in% mapping$ucid)

Done! cdata.filtered now contains only cells that were matched to a 0 Counter in ImageJ. Other uses for this include tagging events, or phenotypes of interest.

Optionally use rcell2.magick to display some images:

cdata.filtered |> 
  rcell2.magick::magickCell(images) |> 
  rcell2.magick::magickForKnitr()

Repeat these steps for the other Multi-point Counters.

Shiny-Magick framework

Extra tools for graphical filtering and annotation/tagging are available in the rcell2.magick package.

That package also includes functions for generating and manipulating images of single cells, and preparing strips, tiles, and plots.

To learn more, install the rcell2.magick package and open it's template notebook.

Uncomment and execute the following to install the package and open the template:

# if(!requireNamespace("rcell2.magick")) remotes::install_github("darksideoftheshmoo/rcell2-magick")

# rcell2.magick::get_workflow_template_magick()

Hu Moments

The Hu Moments can be computed from the masked TIFF files, generated by Cell-ID with the appropriate options (see encode_cellID_in_pixels and fill_interior_pixels in the help page of rcell2.cellid::cell2).

To append the Hu Moments to a cell data object, use the append_hues function:

cell_data <- rcell2::append_hues(cell_data = cell_data, 
                                 return_points = T, 
                                 image_bits = 16)

rcell2:::check_tiff_mask(cell_data)

Alternatively, the Hu Moments can be computed from the TSV files (see output_coords_to_tsv at the help page of rcell2.cellid::cell2), by using the append_hues2.

Read its help page for more information and usage:

?append_hues2


darksideoftheshmoo/rcell2 documentation built on Oct. 23, 2024, 12:59 p.m.