In AndersenLab/easyXpress: A Package to Read, Process, and Analyze Worm Data Generated from CellProfiler’s WormToolbox

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

Introduction

easyXpress is an R package to read, process, and analyze worm morphology data acquired with the cellprofiler-nf nextflow pipeline.

Below is a detailed walk through applying the easyXpress package to sample dose response data extracted from a toxicant dose response article published in the journal Toxicology.

There are four major steps to processing dose response data that are described in detail below.

1. Load Data - Read in CellProfiler data and select the appropriate models for primary objects.
2. Flag Objects - Flag, check, and filter problematic objects, then summarize object data within wells.
3. Flag Wells - Flag, check, and filter problematic wells, then check experimental design balance after filtering.
4. Finalize Results - Calculate control delta's and regress confounding effects if needed.

library(easyXpress)
library(tidyverse)

Get example dose response data

This article will use example data that exceeds the typical size for package vignettes, to recreate the workflow below clone the example data from the AndersenLab/eXDR repository.

To clone the example data, users will need to have git installed on their machine. Follow the instructions here if you don't have git installed or are unsure.

# specify a directory to clone the example data repository to. Here we'll clone
# it to the Desktop. You can use whatever path your like here, just make sure it
# is valid and ends in /
ex.dir <- "~/Desktop/" 

# clone the repo using a system command
system(paste0("git clone https://github.com/AndersenLab/eXDR.git ",
              ex.dir,
              "eXDR"))

Load Data

Read in CellProfiler data and select the appropriate models for primary objects.

`readXpress`

This function reads data output from cellprofiler-nf into R.

readXpress() takes one or more paths to project directories that hold data exported from cellprofiler-nf. Each project directory must have CellProfiler data in a sub-folder named cp_data. Set the rdafile argument to the name of the files you plan to use, one per project. If design = TRUE, a design file will be joined. The design file should be located in a sub-folder of the experimental directory named design. To process dose response data, users must set doseR = TRUE. The other arguments are specified by default but can be adjusted if needed.

For more information regarding project directory structure, see R/easyXpress.

If design = T the readXpress() function returns a list, with the first element named raw_data and the second named design.

# make a vector of project directories cloned in the example data (eXDR)
filedirs <- c(paste0(ex.dir, "eXDR", "/20201210_toxin14A"),
            paste0(ex.dir, "eXDR", "/20201217_toxin15A"),
            paste0(ex.dir, "eXDR", "/20210311_toxin22A"))

# make a vector of the .RData files cloned in the example data (eXDR)
rdafiles <- c("20201210_toxin14A_Analysis-20231003.RData",
              "20201217_toxin15A_Analysis-20231003.RData",
              "20210311_toxin22A_Analysis-20231003.RData")

# Read in the data
dr <- easyXpress::readXpress(filedir = filedirs,
                             rdafile = rdafiles,
                             design = T,
                             doseR = T)

`modelSelection`

modelSelection() takes the raw data output from the readXpress() function. It will assign the appropriate CellProfiler model to each primary object in the data frame.

To access the raw data output from the readXpress() function call above, we can use dr$raw_data.

In this example, the data was generated using 4 worm models (MDHD, L1, L2L3, and L4).

# The modelSelection function to select the best model for each primary object
ms <- easyXpress::modelSelection(dr$raw_data)

Flag Objects

This step of the project processing will flag problematic objects in the data using various ObjectFlag functions (OF)s. Users can filter flags or retain them after they have been checked.

`edgeOF` and `clusterOF`

The edgeOF() function will flag objects close to the edge of the well that are difficult to segment properly.

The clusterOF() function will flag any objects that are found within the same primary object. These 'clusters' are often debris or improperly segmented worms.

# Use the edgeOF fucntion
ef <- easyXpress::edgeOF(data = ms)

# Use the clusterOF function
cf <- easyXpress::clusterOF(data = ef)

`checkOF`

Use the checkOF() function to see how the object flags look across the desired grouping variables, which are specified with ....

c1 <- easyXpress::checkOF(data = cf, drug, concentration_um)

# loop at the plot returned
c1$p

The checkOF plot above shows that there are a lot of objects and many clusters at the higher concentrations of Zinc, which suggests a problem at these concentrations.

`checkObjs`

Use the checkObjs() function to visualize the size distributions of the objects by grouping variables.

# Show just the noFlag data by setting OF = "filter"
c2 <- easyXpress::checkObjs(data = cf, OF = 'filter', drug, concentration_um)

# look at the plot returned
c2

The diagnostic plot made with the checkObjs() function above shows a strong bimodal distribution in the higher concentrations of zinc that further suggest a problem - probably debris in wells.

`checkModels`

The checkModels() function is used to see if small objects are debris. This function will produce diagnostic plots across grouping variables. The default settings help decide whether to use a 100um (30px) or 165um (50px) size filter for objects, and whether to use the MDHD worm model or not. Use ?checkModels() for more help if needed.

Before the checkModels() function is used, a variable that specifies the PATH to the processed images for each project is needed. Below, the dplyr package is used to add that variable and make a well label to include in the diagnostic overlays.

# Add variables that describe the PATH to processed images and well labels
cm <- cf %>%
  # add img dir var
  dplyr::mutate(i.dir =
                  dplyr::case_when(Metadata_Experiment == "toxin14A" ~
                                           paste0(ex.dir, "eXDR/20201210_toxin14A/processed_images/"),
                                         Metadata_Experiment == "toxin15A" ~
                                           paste0(ex.dir, "eXDR/20201217_toxin15A/processed_images/"),
                                         Metadata_Experiment == "toxin22A" ~
                                           paste0(ex.dir, "eXDR/20210311_toxin22A/processed_images/"),
                                         TRUE ~ NA_character_),
                # add well label var
                w.lab = paste(drug, strain, concentration_um, sep = "_"))

# use the checkModels function
cm.out <- easyXpress::checkModels(data = cm,
                      # the grouping vars (...), make a plot for each.
                      Metadata_Experiment, drug,
                      proc.img.dir = "i.dir",
                      well.label = "w.lab",
                       # save in the repo you cloned
                      out.dir = paste0(ex.dir, "eXDR/checkModels/out"))

knitr::include_graphics("checkModels_ex_anno.png", dpi = 144)

Based on the diagnostic plots from the checkModels() function, like the example above, the following filters should be applied for each drug:
1. Zinc dichloride - Remove all objects identified by the MDHD model and remove objects with a worm_length_um < 165um.
2. Paraquat - Use the MDHD model for paraquat, but remove out objects with a worm_length_um < 165um. There is no example shown for paraquat, but you can see the plots in your example directory for yourself.

`userOF`

Use the userOF() function to add the size and model flags to the data as an easyXpress compatible object flag. The variable added can be named anything. In this example, it is named 'user' and passed to the userOF function through ....

# add the user variable that will be converted to an object flag
u = cm %>%
  dplyr::mutate(user = dplyr::case_when(drug == "Zinc dichloride" &
                                          model == "MDHD" ~ "junk",
                                        drug == "Zinc dichloride" &
                                          worm_length_um < 165 ~ "junk",
                                        drug == "Paraquat" &
                                          worm_length_um < 165 ~ "junk",
                                        TRUE ~ NA_character_))
# Run the userOF function and specify user variable as the flag
uf <- easyXpress::userOF(data = u, user)

# Check the object data again to see if the bimodal distributions are resolved.
easyXpress::checkObjs(data = uf, OF = "filter", drug, concentration_um)

The diagnostic plot made with the checkObjs() function above shows that the bimodal distribution in the higher concentrations of zinc are improved by adding the user object flag.

`classifierOF`

The classiferOF() function will flag objects that are likely debris or improperly segmented worms. Currently, this function should only be used on 2X images obtained with the ImageXpress Nano and run through the most recent release of the cellprofiler-nf nextflow pipeline.

Any objects with NA for the data variables used by the classifier will be flagged in the output. These are rare, and often problematic to begin with. Check the function output to see how many objects are being flagged as NA.

# apply the classifierOF function
cl <- easyXpress::classifierOF(data = uf)

`outlierOF`

The outlierOF function will flag objects that have extreme worm_length_um values relative to the other objects in the same well.

The outlierOF function should be applied after all other object flags have been set. This is because the function ignores previously flagged data when searching for outliers in the well.

# apply the outlierOF function
o <- easyXpress::outlierOF(data = cl)

`checkObjs`

Use the checkObjs() function again to check the effect of filtering all the object flags. Any strong multi-modal signals that remain in the data should be reviewed to ensure they are the result of biology rather than anomalous features of the well image.

# check objects again, notice how there are 5 ObjectFlags detected now.
easyXpress::checkObjs(data = o, OF = 'filter', drug, concentration_um)

The plot above looks much better than it did for the raw data. For example, the bimodal distributions at the high concentrations of Zinc are largely removed.

`checkOF`

Use the checkOF function again to see how much data is being flagged. If certain conditions in the experiment are heavily flagged it suggests that those data may not be worth keeping.

# check object flags again
co2 <- easyXpress::checkOF(data = o, drug, concentration_um)

# check the plot
co2$p

There are many flagged objects for high concentration of zinc. This is somewhat expected since the wells appear to have a lot of junk in them.

The viewOverlay() function can be used to check these wells and see if the flags are working properly whether the data are worth keeping.

`viewOverlay`

The viewOverlay() function will annotate the diagnostic images output from cellprofiler-nf with any desired object data. This function is extremely useful for troubleshooting problematic conditions in an experiment. Since this function will build an array of full size images, we recommend sub-setting the data to 48 wells or less to reduce file sizes.

The setOF() function can be used to help annotate object flags. In the example below, the setOF() function is used to collapse the five _ObjectFlag variables to a single variable named objectFlag, which we will use to annotate the overlays. The dplyr package is also used to filter the data to problematic conditions and randomly sample eight wells to pass to the viewOverlay() function.

# IMPORTANT! set the seed to ensure the 8 random wells
# are found in the example images
set.seed(99)

# set the flags and filter data
o2 <- easyXpress::setOF(data = o) %>% # set the flags
  dplyr::filter(drug == "Zinc dichloride" &
                  concentration_um %in% c(200, 400)) %>%
  #randomly sample 8 wells
  dplyr::filter(well.id %in% sample(well.id, size = 8))

# Use the viewOverlay function
vo1 <- easyXpress::viewOverlay(data = o2,
                        proc.img.dir = "i.dir",
                        well.label = "w.lab",
                        obj.label = "model",
                        text.anno = "objectFlag",
                        # save to example dir
                        file = paste0(ex.dir, "eXDR/viewOverlay/overlay1.png"))

knitr::include_graphics("viewOverlay.png", dpi = 144)

The noFlag objects look ok, but the wells are certainly dirty. Since the flags appear to be working we can retain these wells for now and move on. The full sized version of the overlay is saved to the example data directory that was cloned above.

`filterOF`

Use the filterOF() function to remove all the flagged objects from the data after the flags have been checked. It is helpful to retain these data as a final processed object data set.

# finalize the object data made with outlierOF function above.
proc.objs <- easyXpress::filterOF(o, rmVars = T)

`summarizeWells`

The summarizeWells() function will remove flagged objects, summarize the data within each well and drop all object related variables from the data. These actions prepare the data for the next step, which is to flag wells.

# remove all flags, summarize wells, and drop object vars all in one function
raw.wells <- easyXpress::summarizeWells(data = o)

Flag Wells

Flag, check, and filter problematic wells, then check the experimental design balance after filtering.

`titerWF`

Use the titerWF() function to check for improperly titered bleaches. This function will look across the control conditions within grouping variables and flag wells that belong to bleaches with extreme variation in worm numbers or cv.n. Previous analyses suggest that a cv.n > 0.68 is a good threshold. Each set of experiments may require a different threshold, which can be set with the thresh parameter.

The first element of the retruned list tf$d is the flagged well data. A diagnotic plot is returned in the second element of the list tf$p.

# Use the titerWF function within a particular bleach and drug for each strain
tf <- easyXpress::titerWF(data = raw.wells,
                          Metadata_Experiment, bleach, strain, drug,
                          doseR = T)

# look at the diagnostic plot
tf$p

The diagnostic plot above shows that there is one bleach with large variation in worm number cv.n > 0.68. All the wells from this bleach will be flagged.

`nWF`

Use the nWF() function to flag wells with too many or too few objects in them. In this example, the min argument is set to 3, which will flag wells with less than 3 objects. The default value is min = 5, but because the example data is so sparse at the high concentrations of the toxicants the value is set lower.

The function will output a flagged data n$d and a dignostic plot n$p as the first and second elements of a list. The diagnostic plot is faceted by the grouping variables specified with ....

# use the nWF function with the data output from the titerWF function above
n <- easyXpress::nWF(data = tf$d, drug, concentration_um, max = 30, min = 3)

# Look at the diagnostic plot
n$p

The plot shows the distribution of object counts across the grouping variables with the min and max thresholds ploted as vertical lines.

`outlierWF`

Use the outlierWF() function to flag wells with extreme values of median_wormlength_um within the grouping variables specified with .... Typically the grouping variables should isolate strains, conditions, and independent bleaches, like in the example below.

# use the outlierWF function to flag outliers within groups
# then use dplyr::mutate to add a variable to indicate independent bleaches.
ow <- easyXpress::outlierWF(data = n$d,
                            Metadata_Experiment, bleach, drug,
                            concentration_um, strain) %>%
  dplyr::mutate(assay_bleach = paste(Metadata_Experiment, bleach, sep = "_"))

`checkWF`

Use the checkWF() function to see how many wells are being flagged across grouping variables specified with ....

# Use the checkWF function
cw1 <- easyXpress::checkWF(data = ow, drug, concentration_um)

# look at the plot
cw1$p

The diagnostic plot above shows that large fraction of wells are being flagged in the highest Zinc concentration. We can use thefilterWF() and checkBalance() functions to see how much data is retained across grouping variables relative to the complete experimental design after the flags are filtered.

`filterWF`

The filterWF() function will remove all the flagged wells from the data.

# Use the filterWF function and drop the flagging variables afterward
fw <- easyXpress::filterWF(data = ow, rmVars = T)

`checkBalance`

Use the checkBalance() function to see the fraction of data retained after the flags are filtered. A balanced design is critical if the effects of confounding variables are to be regressed with the regEff() function. The original design data is sourced from the dr$design object produced by the readXpress() function above. In the example below, the dplyr package is used to add the assay_bleach variable to the design data so that it can be joined with the data output form the outlierWF() function and used as the x-axis in the resulting plot, see code chunk below.

The checkBalance() function returns a list with summary data for making the plot as cb$d and the plot itself as cb$p.

# use the checkBalance function and add assay_bleach var to design
cb <- easyXpress::checkBalance(data = fw, drug, concentration_um,
                   design = dr$design %>%
                     dplyr::mutate(assay_bleach =
                                     paste(Metadata_Experiment, bleach)),
                   x = assay_bleach)

# Look at the plot and add a nicer x-axis with ggplot2 package
cb$p +
  ggplot2::theme(axis.text.x = ggplot2::element_text(angle = 90, vjust = 0.5)) +
  ggplot2::geom_hline(yintercept = 0.75, linetype = 2, color = "red")

The plot above shows that the flagged data fall in particular bleaches. We recommend that the independent bleaches with less than 75 % of data retained be removed. Based on this plot, the bleaches that should be removed for each drug are listed below.
1. Paraquat - The four bleaches toxin14A_2, toxin14A_3, toxin15A_1, and toxin15A_3 should be removed.
2. Zinc dichloride - The five bleaches toxin14A_1, toxin14A_2, toxin15A_1, toxin15A_3, and toxin22A_3 should be removed.

`checkEff`

The checkEff() function is used to look at the data after the problematic bleaches have been removed. For more help with the checkEff() function use ?checkEff().

# filter the unbalanced bleaches - nearly half the data in this example.
drop <- fw %>%
  dplyr::mutate(b.filter =
                  dplyr::case_when(drug == "Paraquat" &assay_bleach %in%
                                     c("toxin14A_2",
                                       "toxin14A_3",
                                       "toxin15A_1",
                                       "toxin15A_3") ~  "drop",
                                   drug == "Zinc dichloride" & assay_bleach %in%
                                     c("toxin14A_1",
                                       "toxin14A_2",
                                       "toxin15A_1",
                                       "toxin15A_3",
                                       "toxin22A_3") ~ "drop",
                                   TRUE ~ "keep")) %>%
  dplyr::filter(b.filter == "keep") %>%
  dplyr::select(-b.filter)

# use the checkEff function
ce1 <- easyXpress::checkEff(data = drop,
                            drug, strain,
                            x = concentration_um,
                            y = median_wormlength_um,
                            fill = assay_bleach,
                            scales = "free_x")

# look at the plot
ce1

The dose effect is as expected for the two strains, increasing concentrations of toxicants reduce the length of animals in the wells. There could be an effect of assay and bleach that can be tested with the regEff() function.

Finalize Results

`regEff`

The regEff() function will regress out the effect of a confounding variable on a dependent variable within groups specified by .... The function extracts the residuals of a linear model specified as d.var ~ c.var - 1 that remove the estimated influence of the confounding variable c.var on the dependent variable d.var. The function returns a list with four elements. the residuals of the method should be used carefully as it can introduce noise into data awill also produce diagnostic plots and We recommend testing

Before using the regEff() function it is important to have a well balanced design. If the effect of bleach

# Regress the effect of independent bleaches for each drug using regEff()
reg <- easyXpress::regEff(data = drop,
                          drug,
                          d.var = median_wormlength_um,
                          c.var = assay_bleach)

# Look at the regression coefficients in the diagnostic plot p2
reg$p2

The diagnostic plot above shows the point estimates of regression coefficients for each of the indepedent bleaches faceted by drug with standard errors as lines. We recommend further filtering bleaches with extreme effects. Here there are no extreme bleach effects to filter.

`delta`

The delta() function is used to control for innate differences among strains. This function will calculate the difference in well summary statistics between the experimental condition and the median control condition within a group. It is recommended to group data to independent bleaches for all strains and drugs when processing dose response data.

This is the final step in processing the data. See the help for the delta() function with ?delta().

# use the delta() function
del <- easyXpress::delta(data = reg$d,
                         assay_bleach, drug, strain, # group with ...
                         WF = "filter",
                         doseR = TRUE,
                         vars = "median_wormlength_um_reg")

# check the finalized data
easyXpress::checkEff(data = del, drug, strain, x = concentration_um,
         y = median_wormlength_um_reg_delta,
         fill = assay_bleach,
         scales = "free_x")

The finalized data can be used to fit dose response curves with the `dr

AndersenLab/easyXpress documentation built on Dec. 6, 2024, 4:04 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

AndersenLab/easyXpress
A Package to Read, Process, and Analyze Worm Data Generated from CellProfiler’s WormToolbox

In AndersenLab/easyXpress: A Package to Read, Process, and Analyze Worm Data Generated from CellProfiler’s WormToolbox

Introduction

Get example dose response data

Load Data

`readXpress`

`modelSelection`

Flag Objects

`edgeOF` and `clusterOF`

`checkOF`

`checkObjs`

`checkModels`

`userOF`

`classifierOF`

`outlierOF`

`checkObjs`

`checkOF`

`viewOverlay`

`filterOF`

`summarizeWells`

Flag Wells

`titerWF`

`nWF`

`outlierWF`

`checkWF`

`filterWF`

`checkBalance`

`checkEff`

Finalize Results

`regEff`

`delta`

R Package Documentation

Browse R Packages

We want your feedback!

AndersenLab/easyXpress A Package to Read, Process, and Analyze Worm Data Generated from CellProfiler’s WormToolbox

In AndersenLab/easyXpress: A Package to Read, Process, and Analyze Worm Data Generated from CellProfiler’s WormToolbox

Introduction

Get example dose response data

Load Data

readXpress

modelSelection

Flag Objects

edgeOF and clusterOF

checkOF

checkObjs

checkModels

userOF

classifierOF

outlierOF

checkObjs

checkOF

viewOverlay

filterOF

summarizeWells

Flag Wells

titerWF

nWF

outlierWF

checkWF

filterWF

checkBalance

checkEff

Finalize Results

regEff

delta

R Package Documentation

Browse R Packages

We want your feedback!

AndersenLab/easyXpress
A Package to Read, Process, and Analyze Worm Data Generated from CellProfiler’s WormToolbox

`readXpress`

`modelSelection`

`edgeOF` and `clusterOF`

`checkOF`

`checkObjs`

`checkModels`

`userOF`

`classifierOF`

`outlierOF`

`checkObjs`

`checkOF`

`viewOverlay`

`filterOF`

`summarizeWells`

`titerWF`

`nWF`

`outlierWF`

`checkWF`

`filterWF`

`checkBalance`

`checkEff`

`regEff`

`delta`