After image acquisition, Mother Machine data are processed using deepMoMA. Here we describe the analysis of deepMoMA's output using R, in particular: - how to load MoMA data - how to concatenate all data analysed in a project in the same dataframe - simple examples of data selection and plotting.
Most of this R code rely heavily on the tidyverse
libraries. Getting familiar with those first, e.g. reading R for data science, will definitely help you follow this analysis.
.
is the project directory. All paths to data files are defined locally.
First, load the mother_machine_example.Rproj
file of the template project (in example
).
The R environment used for this project is managed using renv
(running renv::init()
should restore all necessary package, but you need to make sure that you use an appropriate R version — at best the same as described in the file renv.lock
).
Learn more about collaborating with renv
at https://rstudio.github.io/renv/articles/collaborating.html#collaborating-with-renv.
Run src/mother_machine_example.R
to load the data and render the analysis files to html.
Note that calling render()
or render_site()
from the command line allows to execute the function in the global env() (hence inheriting existing variables and keeping newly created ones).
Since experimental conditions are not recorded in deepMoMA output, they are documented, together with paths to deepMoMA export files, in a YAML file. Each condition is defined as an associative array which is expected to have the following properties:
condition
must be a unique stringmedium
, duration
and interval
define "steps"; those variables can be lists (of the same length for a given condition) or scalars (value is then recycled)paths
are lists of stringsseries
can be declared as a list containing paths and other custom fields (see example below).
NB: since series are not named, (at least one) custom field must be used to distinguish them.For the rest, any valid YAML syntax should be accepted (mind tabs vs spaces for indentation).
This example defines two "conditions": the first one is specified using the series syntax (with a custom field antibio
) and has a custom field strain
; the second one only has a list of (one) replicate specified using paths
and a custom field antibio
(for consistency with the first condition).
- condition: AB_before
duration: 360
interval: 3
medium: M9+glucose
strain: MG1655_hi2
series:
- antibio: CIP
paths:
- ./data/20200922_curated/20200922_CIP_cell_stats
- ./data/20201119_curated/20201119_CIP_cell_stats
- antibio: CEF
paths:
- ./data/
- ./data/20200922_curated/20200922_CEF_cell_stats
- ./data/20201119_curated/20201119_CEF_cell_stats
- condition: AB_2h
duration: [360, 6, 960, 120, 960]
interval: 3
medium: [M9, antibio, M9, antibio, M9]
antibio: TMP
paths:
- ./data/20200922_curated/20200922_TMP_cell_stats
This is parsed to the following dataframe by parse_yaml_conditions()
, with one row per condition and step:
# A tibble: 7 × 8
condition duration interval medium strain antibio paths step_idx
<chr> <int> <int> <chr> <chr> <chr> <list> <int>
1 AB_before 360 3 M9+glucose MG1655_hi2 CIP <chr [2]> 1
2 AB_before 360 3 M9+glucose MG1655_hi2 CEF <chr [3]> 1
3 AB_2h 360 3 M9 NA TMP <chr [1]> 1
4 AB_2h 6 3 antibio NA TMP <chr [1]> 2
5 AB_2h 960 3 M9 NA TMP <chr [1]> 3
6 AB_2h 120 3 antibio NA TMP <chr [1]> 4
7 AB_2h 960 3 M9 NA TMP <chr [1]> 5
Note that it is possible to change the order of the fields in the YAML file, e.g. to document experiment in a more replicate-centered way.
The data loading step produces three dataframes:
myframes
has the following variables (and more!) by default
id
, gl
: the ID and growth lane of the given cell.path
: the path of the CSV file containing the information of the given cell.time\_sec
: time in seconds when the measurement has been done.length\_um
length of the cell in μm at the time of the measurement.end\_type
: end of a cell and it can be division (div), end of the measurement (*eod`), "pruned" or exit.start\_time
and end\_time
: time in seconds when the cell was born and died.cell_rank
: cell rank counted from the channel closed endmyframes
is designed to be enriched with new columns during the analysis; in particular dummy variables are helpful to filter the subset of the data that matches a given criteria.
discard\_start
: when acquiring fluorescence, it is likely that cells experience some phototoxicity. In the case of the glucose/lactose switch, we identified that the doubling time of cells increase during the first 2 hours. Hence discard\_start==TRUE
for the first 2 hours (aka "preexperiment step").filter(myframes, !discard_start)
should be used in any case to select valid observations.
fluo\_amplitude
and fluo\_background
: the fluorescence of the cell and the background as fitted from perl postprocessing scripts. The intensity of a cell is fitted with a Cauchy distribution mixed with a uniform distribution. The fluorescence amplitude is given by the Cauchy, while the background is given by the uniform.fluo\_gfp\_amplitude
: the fluorescence of the GFP contained in the cell and it is defined as fluo_amplitude minus the estimated autofluorescence (inferred from the length, using calibration data from a GFP-less strain).gfp\_nb
: number of GFP molecules i.e. the GFP fluorescence rescaled with the parameter fp\_per\_dn
, discussed later.Once data are stored in the dataframe myframes
, this tibble supports all the manipulations allowed by the dplyr library. Here are a few basic examples:
filter
. For example, if we want to isolate usable cells (not in the pre-experiment step, not excluded at the top of the channel), from the growth channel #14, and from a given condition (e.g. test
), we can writeempty <- filter(myframes, !discard_start, condition=='test', gl==14)
stratified <- group_by(myframes, medium, date)
stratified <- group_by(myframes, medium, date)
summarize(stratified, mu=mean(gfp_nb))
myframes %>%
filter(!discard_start) %>%
group_by(medium, date) %>%
summarize(mu=mean(gfp_nb), sd=sd(gfp_nb))
Once a project is completed, it is convenient to produce a static output of all its notebooks, which can later be browsed without initialising an R session. To facilitate this, the project template is designed as a Rmarkdown "site" (which requires rmarkdown ≥ 1.0). An example of this type of notebook rendering can be browsed at https://julou.github.io/MoM_lacInduction.
In _site.yml, shared properties are set, in particular the structure of the navigation bar.
exclude: ["*"]
is required to prevent all subdirectories to be copied (all the more so as symlinks are followed!)
The output is rendered in /docs
, which is supported by Github Pages as website root; a .nojekyll
empty file ensures that files are served as they are.
NB: the index.Rmd
file is required for site_render() to execute.
Here is an example of the minimal YAML header to put in each Rmarkdown file. NB: date syntax from http://stackoverflow.com/questions/23449319
---
title: "My relevant title"
author: Thomas Julou
date: "`r format(Sys.time(), '%d %B, %Y')`"
---
If your repo is synchronised with github at the url `https://github.com/myAccount/myProject", the corresponding Github Pages website wil be available at https://myAccount.github.io/myProject.
The corresponding files must be pushed to the github repository, and the deployment must be configured in the repository's settings (under "Code and automation > Pages"). There are primarily two ways to achieve this, and the settings must be adjusted accordingly.
Arguably the simplest option is to leave the rendered files in the docs
subdirectory where they have been output by Rmarkdown. The obvious drawback is that the code is now cluttered by this additional subdirectory.
To do the versioning this way, simply delete the .gitignore
file in docs
, then commit and push the files as usual.
In order to separate the code and the rendered files, it is possible to copy the content of the docs
directory (of the code branch master
) to the root of a dedicated branch (typically named gh-branch
).
# create branch if it does not exist
git checkout -b gh-pages
git checkout master # make sure you are on the "code" branch
git checkout gh-pages "docs/*"
After that, don't forget to commit and push the content of both branches.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.