In dmattek/ARCOS: Automatic Recognition of Collective Signalling

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

Intro

This notebook demonstrates how to test the hypothesis that the statistics of collective events (CEs) differ from the statistics calculated in a random dataset.

The dataset used in this example was generated with NetLogo agent-based simulation that models collective activation events in a 2D epithelium. In the model, 2500 cells were placed on a 50x50 hexagonal grid. Collective events appear sparsely in the epithelium, they propagate radially from a source cell up to 3 cell rows. There is a 15% failure for a cell to propagate the wave, therefore the events are not always perfectly symmetrical. The simulation lasts 200 iterations.

Collective events are identified with ARCOS. To calculate the p value of a sample statistic, 5 different randomisationping methods are applied:

shuffling entire single-cell tracks in space,
randomly shifting entire single-cell tracks back or forth in time,
shuffling time points independently for each track,
shuffling the measurement between space independently for each time frame,
shuffling activity blocks independently for each track.

Set the parameters

library(ARCOS)
library(parallel, quietly = T)
library(data.table, quietly = T)

library(ggplot2, quietly = T)
library(ggthemes, quietly = T)
library(viridis, quietly = T)

## Definitions

lPar = list()
lPar$fIn = system.file('synthetic2D/Gradient20.csv.gz',
                       package = 'ARCOS')

# Parameters for tracking collective events
lPar$db.eps = 1.5 # neighbourhood size; units of pixels
lPar$db.minPts = 3L # minimum number of cells in a cluster

lPar$trackLinkFrames = 1L # no. of subsequent frames to link during tracking; units of frames

lPar$coll.mindur = 2L # minimum duration of collective events
lPar$coll.mintotsz = lPar$db.minPts # minimum total size of collective events

lPar$debug = T

# Parameters for randomisationping
lPar$ncores = 2L # number of parallel cores
lPar$nboot = 100L # number of randomisationping iterations

# Dataset limits for plotting
lPar$limxy = c(-25, 25)
lPar$limt = c(0, 220)

# True size and duration of a single synthetic event
lPar$sim.cltotsz = 37
lPar$sim.cldur = 9

# True maximum diameter evolution of a single event in time
ttrueDiam = data.table(tevent = 0:8,
                       diam = c(2.236068, 4.472136, rep(6.708204,7)))

# Column names
lCols = list(frame = 'frame',
             posx = 'x',
             posy = 'y',
             trackid = 'id',
             meas = 'meas',
             measresc = 'meas.resc',
             measbin = 'meas.bin',
             cltrackid = 'collid',
             bootiter = 'bootiter')

cat(sprintf("\nNeighbourhood size for dbscan: %.2f [px] \n\n", lPar$db.eps))
cat(sprintf("Minimum cluster size for dbscan: %d [cells] \n\n", lPar$db.minPts))
cat(sprintf("Number of subsequent frames to link: %d [frames] \n\n", lPar$trackLinkFrames))
cat(sprintf("\nMinimum duration of collective events: %d [frames] \n\n", lPar$coll.mindur))
cat(sprintf("Minimum total size of collective events: %d [cells] \n\n", lPar$coll.mintotsz))
cat(sprintf("Number of randomisationping iterations: %d \n\n", lPar$nboot))

Read data

Read a compressed CSV file, Gradient20.csv.gz from the NetLogo simulation output. The dataset is included with the ARCOS package. The function loadDataFromFile is a wrapper for data.table::fread. It reads the file and sets the attributes of the ARCOS object according to parameters defined above.

# read data
ts = ARCOS::loadDataFromFile(lPar$fIn,
                             colPos = c(lCols$posx, lCols$posy), 
                             colMeas = lCols$meas,
                             colFrame = lCols$frame, 
                             colIDobj = lCols$trackid)

The dataset is in long format where each row corresponds to the measurement and position of a single object/cell at a particular point in time:

knitr::kable(head(ts), digits = 2)

The columns have the following data:

frame, time frames,
id, object/cell IDs,
x and y, object coordinates,
startCE, an indication of when a collective event is initiated in the simulation,
meas, the measurement.

Identify activity

Here we identify activity pulses in single cells by using a simple thresholding method, i.e. all measurements above 0 are considered active. The function binMeas adds two new columns to the dataset, meas.resc and meas.bin with rescaled and binarised measurements, respectively.

The parameter smoothK=1 indicates that the smoothing window initially applied to the time series is a single time point, thus no smoothing is performed. The result of thresholding is equivalent to calculating meas > 0.

# binarise the measurement
ARCOS::binMeas(ts,
               biasMet = "none", 
               smoothK = 1,
               binThr = 0.)

knitr::kable(head(ts), digits = 2)

The results of activity detection for a sample of long tracks:

ARCOS::plotBinMeas(ts, 
                   ntraj = 25, 
                   inSeed = 11,
                   measfac = 1.,
                   xfac = 1, 
                   plotResc = F) +
  coord_cartesian(ylim = c(0,2)) +
  theme_few() +
  xlab("Time [frames]") +
  ylab("ERK activity")

CEs - non-randomisationped

Identify collective events in the original data only for active cells, which in our case are cells with the measurement above 0.

The function trackColl returns a data table in long format only with objects/cells that participate in collective events. Two additional columns are added, collid.frame and collid, with IDs of collective events per frame and a globally unique ID, respectively. The latter, collid is what we will use in further analysis.

# Perform tracking of collective events.
# Pass single-cell data that has a binarised measurement greater than 1.
tcoll = ARCOS::trackColl(ts[get(lCols$measbin) > 0], 
                         eps = lPar$db.eps, 
                         minClSz = lPar$db.minPts)

# Filter collective events based on size and duration
tcollsel = ARCOS::selColl(tcoll,
                          colldur = c(lPar$coll.mindur, Inf),
                          colltotsz = c(lPar$coll.mintotsz, Inf))

knitr::kable(head(tcollsel), digits = 2)

Cluster properties

CE growth

Evolution of CEs' growth quantified by the maximum Feret diameter (longer side of a minimal oriented bounding box). Red dots indicate the true evolution of a single synthetic event. Note horizontal shifts to avoid overplotting!

We observe that the majority of collective events grow as a prototypical synthetic event with only few events that depart this dynamics due to overlaps with other events.

# Calculate the minimal bounding box
tcollselDiam = calcGrowthColl(tcollsel)

The function calcGrowthColl adds the width, height and the time of collective events' evolution. The longer of the width and height is taken as the diameter of the event.

knitr::kable(head(tcollselDiam), digits = 2)

ggplot(tcollselDiam,
       aes(x = tevent,
           y = diam,
           group = get(lCols$cltrackid))) +
  geom_line(position = position_dodge(.5), alpha = 0.5) +
  geom_point(data = ttrueDiam,
             aes(x = tevent,
                 y = diam,
                 group = 1),
             colour = 'red') +
  xlab("Time [frames]") +
  ylab('Diameter [au]') +
  theme_minimal()

Duration & total size

We extract two statistics of collective events: event duration and the total number of unique cells involved in an event (~ size). Dashed lines indicate true size & duration of synthetic events. Several bigger and longer-lasting than synthetic events are due to overlap with other events.

# Calculate duration of collective events in frames
tcollselAggr = ARCOS::calcStatsColl(tcollsel)


ggplot(tcollselAggr,
       aes(x = totSz,
           y = clDur)) +
  geom_hex() +
  viridis::scale_fill_viridis(discrete = F,
                              trans = "log10") +
  geom_vline(xintercept = lPar$sim.cltotsz, color = "red", linetype = 2) +
  geom_hline(yintercept = lPar$sim.cldur, color = "red", linetype = 2) +
  xlab("#cells involved in collective events") +
  ylab("Duration of collective events [#frames]") +
  theme_bw()

Noodle plot

X/T projection of individual cells that participate in collective events. Tracks are coloured by the IS of collective events. Red dots indicate the beginning of a spontaneous collective event in the simulation.

ARCOS::plotNoodle2D(tcollsel, 
                    xfac = 1,
                    yfac = 1,
                    pos = 1,
                    tfreq = 1, 
                    style = 'line') +
  geom_point(data = ts[startCE == 1],
             aes(x = get(lCols$frame),
                 y = get(lCols$posx),
                 group = 1),
             color = 'red',
             alpha = 0.8,
             size = 1) +
  xlab("Time [frames]") +
  ylab("Position X [au]") +
  theme_minimal() +
  theme(legend.position = "none")

cat(sprintf("Number of initiated events: %d\n\n", sum(ts$startCE)))
cat(sprintf("Number of detected events: %d\n\n", nrow(tcollselAggr)))
cat(sprintf("Average binarised activity level: %.4f\n\n", mean(ts[[lCols$measbin]])))
cat(sprintf("Fraction of activity in (filtered) collective events: %.4f\n\n", calcFracActInColl(ts, tcollsel)))

CEs - randomisationed

Calculate p-values for the two statistics (duration and size) using randomisationping. Each calculation can take up to several minutes!

Shuffle initial XY

In this approach, the initial position of every track is shuffled between all cells in the system. Since the measurement dynamics remains intact, this approach randomises only the spatial aspect of collective phenomena, i.e. tracks are only rearranged in space while retaining their measurement dynamics. One caveat concerns cases where cells/objects are moving in space. By assigning new starting points, objects' positions may exceed the initial bounds of the field of view.

The ARCOS::trackCollBoot function outputs a data.table in long format with time points when individual objects participate in collective events. From this output, further statistics can be extracted, such as: duration, size, diameter, number of collective events. The bootiter column indexes randomisationping iterations.

# Perform randomisation or load data from a file
stmp = system.file(sprintf('synthetic2D/output-randomisation-20/randRes_shuffIniXY_N%d.csv.gz', lPar$nboot),
                   package = 'ARCOS')

if (file.exists(stmp)) {

  tcollselBoot = ARCOS::loadDataFromFile(fname = stmp,
                                         colPos = c(lCols$posx, lCols$posy),
                                         colFrame = lCols$frame,
                                         colIDobj = lCols$trackid,
                                         colIDcoll = lCols$cltrackid,
                                         colBootIter = lCols$bootiter,
                                         fromColl = TRUE,
                                         fromBoot = TRUE)
} else {
  set.seed(11)

  tcollselBoot = ARCOS::trackCollBoot(obj = ts, 
                                      nboot = lPar$nboot, 
                                      ncores = lPar$ncores,
                                      method = 'shuffCoord', 
                                      eps = lPar$db.eps, 
                                      epsPrev = lPar$db.eps, 
                                      nPrev = 1L, 
                                      minClSz = lPar$db.minPts,
                                      colldurlim = c(lPar$coll.mindur, 1000), 
                                      colltotszlim = c(lPar$coll.mintotsz, 1000),
                                      deb = F)

  #fwrite(tcollselBoot, file = sprintf('../inst/synthetic2D/output-randomisation-20/bootRes_shuffIniXY_N%d.csv.gz', lPar$nboot))
}

knitr::kable(head(tcollselBoot), digits = 2)

Sample noodle plot from a single randomisation iteration:

ARCOS::plotNoodle2D(tcollselBoot[get(lCols$bootiter) == 1], 
                    xfac = 1,
                    yfac = 1,
                    pos = 1,
                    tfreq = 1, 
                    style = 'point') +
  coord_cartesian(xlim = lPar$limt, ylim = lPar$limxy) +
  xlab("Time [frames]") +
  ylab("Position X [au]") +
  ggthemes::theme_few() +
  theme(legend.position = "none")

Distribution of the fraction of collective events in randomisation iterations compared to the observed fraction (red dashed line):

tFracActInColl = calcFracActInColl(ts, tcollsel)
tFracActInCollBoot = calcFracActInColl(ts, tcollselBoot)

ggplot(tFracActInCollBoot,
       aes(x = fracAct,
           y = after_stat(density))) +
  geom_freqpoly(bins = 30) +
  geom_vline(xintercept = tFracActInColl,
             color = "red", linetype = 2) +
  xlab('Fraction of activity in collective events') +
  ggthemes::theme_few()

P-value, alternative hypothesis - the fraction of total (binarised) activity involved in observed collective events is greater than the mean fraction in randomised events:

calcPvalFromBS(tFracActInColl, tFracActInCollBoot$fracAct, corrected = T, alternative = 'greater')

Aggregate statistics from iterations:

# Aggregate the results of randomisationping per collective event
tcollbootAggrPerColl = calcStatsColl(tcollselBoot)

# Further aggregate size & duration per randomisation iteration
tcollbootSzDurPerIter = tcollbootAggrPerColl[,
                                             lapply(.SD, mean),
                                             by = c(lCols$bootiter),
                                             .SDcols = c('clDur', 'totSz', 'minSz', 'maxSz')]

# Further aggregate the number of events per randomisation iteration
tcollbootNcollPerIter = tcollbootAggrPerColl[,
                                             .(N = sum(!is.na(get(lCols$cltrackid)))),
                                             by = c(lCols$bootiter)]

Distribution of the number of collective events identified in every randomisation iteration compared to the observed statistics (red dashed line):

ggplot(data = tcollbootNcollPerIter,
       aes(x = N,
           y = after_stat(density))) +
  geom_histogram(binwidth = 2,
                 color="lightblue", fill="lightblue") +
  geom_vline(xintercept = nrow(tcollselAggr),
             color = "red", linetype = 2) +
  xlab('Number of collective events') +
  ggthemes::theme_few()

P-value, alternative hypothesis - the number of observed CEs is less than the mean number of clusters in randomised data:

calcPvalFromBS(nrow(tcollselAggr), tcollbootNcollPerIter$N, corrected = T, alternative = 'less')

P-value, alternative hypothesis - the number of observed CEs is greater than the mean number of clusters in randomised data:

calcPvalFromBS(nrow(tcollselAggr), tcollbootNcollPerIter$N, corrected = T, alternative = 'greater')

Distribution of the mean cluster duration from randomisation iterations (solid black line) vs. the mean cluster duration from the non-randomised data (red dashed line):

ggplot(tcollbootSzDurPerIter,
       aes(x = clDur,
           y = after_stat(density))) +
  geom_freqpoly(color = 'black', bins = 20) +
  geom_vline(xintercept = mean(tcollselAggr$clDur), color = 'red', linetype = 'dashed') +
  xlab("Duration of collective events [#frames]") +
  ggthemes::theme_few()

P-value, alternative hypothesis - the mean observed cluster duration is greater than the mean cluster duration in randomised data:

calcPvalFromBS(mean(tcollselAggr$clDur), tcollbootSzDurPerIter$clDur, corrected = T, alternative = 'greater')

Distribution of the mean total cluster size from randomisation iterations (solid black line) vs. the mean total cluster size from the non-randomised data (red dashed line):

ggplot(tcollbootSzDurPerIter,
       aes(x = totSz,
           y = after_stat(density))) +
  geom_freqpoly(color = 'black', bins = 20) +
  geom_vline(xintercept = mean(tcollselAggr$totSz), color = 'red', linetype = 'dashed') +
  xlab("#cells involved in collective events") +
  ggthemes::theme_few()

P-value, alternative hypothesis - the mean observed total cluster size is greater than the cluster size in randomised data:

calcPvalFromBS(mean(tcollselAggr$totSz), tcollbootSzDurPerIter$totSz, corrected = T, alternative = 'greater')

Randomly shift the measurement

Here, entire single-cell tracks are shifted forward or backward in time. The spatial arrangement of cells and the measurement dynamics are intact, except activation events occur at different times than originally.

stmp = system.file(sprintf('synthetic2D/output-randomisation-20/randRes_randShiftMeas_N%d.csv.gz', lPar$nboot),
                   package = 'ARCOS')

if (file.exists(stmp)) {
  tcollselBoot = loadDataFromFile(fname = stmp,
                                  colPos = c(lCols$posx, lCols$posy),
                                  colFrame = lCols$frame,
                                  colIDobj = lCols$trackid,
                                  colIDcoll = lCols$cltrackid,
                                  colMeas = lCols$meas,
                                  colBootIter = lCols$bootiter,
                                  fromColl = TRUE,
                                  fromBoot = TRUE)
} else {
  set.seed(11)

  tcollselBoot = ARCOS::trackCollBoot(obj = ts, 
                                      nboot = lPar$nboot, 
                                      ncores = lPar$ncores,
                                      method = 'randShiftMeas', 
                                      eps = lPar$db.eps, 
                                      epsPrev = lPar$db.eps, 
                                      nPrev = 1L, 
                                      minClSz = lPar$db.minPts,
                                      colldurlim = c(lPar$coll.mindur, 1000), 
                                      colltotszlim = c(lPar$coll.mintotsz, 1000),
                                      deb = F)

  #fwrite(tcollselBoot, file = sprintf('../inst/synthetic2D/output-randomisation-20/bootRes_randShiftMeas_N%d.csv.gz', lPar$nboot))
}

knitr::kable(head(tcollselBoot), digits = 2)

Sample noodle plot from a single randomisation iteration:

ARCOS::plotNoodle2D(tcollselBoot[get(lCols$bootiter) == 1], 
                    xfac = 1,
                    yfac = 1,
                    pos = 1,
                    tfreq = 1, 
                    style = 'point') +
  coord_cartesian(xlim = lPar$limt, ylim = lPar$limxy) +
  xlab("Time [frames]") +
  ylab("Position X [au]") +
  ggthemes::theme_few() +
  theme(legend.position = "none")

Distribution of the fraction of collective events in randomisation iterations compared to the observed fraction (red dashed line):

tFracActInColl = calcFracActInColl(ts, tcollsel)
tFracActInCollBoot = calcFracActInColl(ts, tcollselBoot)

ggplot(tFracActInCollBoot,
       aes(x = fracAct,
           y = after_stat(density))) +
  geom_freqpoly(bins = 30) +
  geom_vline(xintercept = tFracActInColl,
             color = "red", linetype = 2) +
  xlab('Fraction of activity in collective events') +
  ggthemes::theme_few()

P-value, alternative hypothesis - the fraction of total (binarised) activity involved in observed collective events is greater than the mean fraction in randomised events:

calcPvalFromBS(tFracActInColl, tFracActInCollBoot$fracAct, corrected = T, alternative = 'greater')

Aggregate statistics from iterations:

# Aggregate the results of randomisationping per collective event
tcollbootAggrPerColl = calcStatsColl(tcollselBoot)

# Further aggregate size & duration per randomisation iteration
tcollbootSzDurPerIter = tcollbootAggrPerColl[,
                                             lapply(.SD, mean),
                                             by = c(lCols$bootiter),
                                             .SDcols = c('clDur', 'totSz', 'minSz', 'maxSz')]

# Further aggregate the number of events per randomisation iteration
tcollbootNcollPerIter = tcollbootAggrPerColl[,
                                             .(N = sum(!is.na(get(lCols$cltrackid)))),
                                             by = c(lCols$bootiter)]

Distribution of the number of collective events identified in every randomisation iteration compared to the observed statistics (red dashed line):

ggplot(data = tcollbootNcollPerIter,
       aes(x = N,
           y = after_stat(density))) +
  geom_histogram(binwidth = 2,
                 color="lightblue", fill="lightblue") +
  geom_vline(xintercept = nrow(tcollselAggr),
             color = "red", linetype = 2) +
  xlab('Number of collective events') +
  ggthemes::theme_few()

P-value, alternative hypothesis - the number of observed CEs is less than the mean number of clusters in randomised data:

calcPvalFromBS(nrow(tcollselAggr), tcollbootNcollPerIter$N, corrected = T, alternative = 'less')

Distribution of the mean cluster duration from randomisation iterations (solid black line) vs. the mean cluster duration from the non-randomised data (red dashed line):

ggplot(tcollbootSzDurPerIter,
       aes(x = clDur,
           y = after_stat(density))) +
  geom_freqpoly(color = 'black', bins = 20) +
  geom_vline(xintercept = mean(tcollselAggr$clDur), color = 'red', linetype = 'dashed') +
  xlab("Duration of collective events [#frames]") +
  ggthemes::theme_few()

P-value, alternative hypothesis - the mean observed cluster duration is greater than the mean cluster duration in randomised data:

calcPvalFromBS(mean(tcollselAggr$clDur), tcollbootSzDurPerIter$clDur, corrected = T, alternative = 'greater')

Distribution of the mean total cluster size from randomisation iterations (solid black line) vs. the mean total cluster size from the non-randomised data (red dashed line):

ggplot(tcollbootSzDurPerIter,
       aes(x = totSz,
           y = after_stat(density))) +
  geom_freqpoly(color = 'black', bins = 20) +
  geom_vline(xintercept = mean(tcollselAggr$totSz), color = 'red', linetype = 'dashed') +
  xlab("#cells involved in collective events") +
  ggthemes::theme_few()

P-value, alternative hypothesis - the mean observed total cluster size is greater than the cluster size in randomised data:

calcPvalFromBS(mean(tcollselAggr$totSz), tcollbootSzDurPerIter$totSz, corrected = T, alternative = 'greater')

Shuffle time points per track

Here, individual time points are shuffled independently in each track. While the spatial arrangement is intact, the measurement dynamics is completely randomised.

stmp = system.file(sprintf('synthetic2D/output-randomisation-20/randRes_shuffMeasTrack_N%d.csv.gz', lPar$nboot),
                   package = 'ARCOS')

if (file.exists(stmp)) {
  tcollselBoot = loadDataFromFile(fname = stmp,
                                  colPos = c(lCols$posx, lCols$posy),
                                  colFrame = lCols$frame,
                                  colIDobj = lCols$trackid,
                                  colIDcoll = lCols$cltrackid,
                                  colMeas = lCols$meas,
                                  colBootIter = lCols$bootiter,
                                  fromColl = TRUE,
                                  fromBoot = TRUE)
} else {
  set.seed(11)

  tcollselBoot = ARCOS::trackCollBoot(obj = ts, 
                                      nboot = lPar$nboot, 
                                      ncores = lPar$ncores,
                                      method = 'shuffMeasTrack', 
                                      eps = lPar$db.eps, 
                                      epsPrev = lPar$db.eps, 
                                      nPrev = 1L, 
                                      minClSz = lPar$db.minPts,
                                      colldurlim = c(lPar$coll.mindur, 1000), 
                                      colltotszlim = c(lPar$coll.mintotsz, 1000),
                                      deb = F)

  #fwrite(tcollselBoot, file = sprintf('../inst/synthetic2D/output-randomisation-20/bootRes_shuffMeasTrack_N%d.csv.gz', lPar$nboot))
}

knitr::kable(head(tcollselBoot), digits = 2)

Sample noodle plot from a single randomisation iteration:

ARCOS::plotNoodle2D(tcollselBoot[get(lCols$bootiter) == 1], 
                    xfac = 1,
                    yfac = 1,
                    pos = 1,
                    tfreq = 1, 
                    style = 'point') +
  coord_cartesian(xlim = lPar$limt, ylim = lPar$limxy) +
  xlab("Time [frames]") +
  ylab("Position X [au]") +
  ggthemes::theme_few() +
  theme(legend.position = "none")

Distribution of the fraction of collective events in randomisation iterations compared to the observed fraction (red dashed line):

tFracActInColl = calcFracActInColl(ts, tcollsel)
tFracActInCollBoot = calcFracActInColl(ts, tcollselBoot)

ggplot(tFracActInCollBoot,
       aes(x = fracAct,
           y = after_stat(density))) +
  geom_freqpoly(bins = 30) +
  geom_vline(xintercept = tFracActInColl,
             color = "red", linetype = 2) +
  xlab('Fraction of activity in collective events') +
  ggthemes::theme_few()

P-value, alternative hypothesis - the fraction of total (binarised) activity involved in observed collective events is greater than the mean fraction in randomised events:

calcPvalFromBS(tFracActInColl, tFracActInCollBoot$fracAct, corrected = T, alternative = 'greater')

Aggregate statistics from iterations:

# Aggregate the results of randomisationping per collective event
tcollbootAggrPerColl = calcStatsColl(tcollselBoot)

# Further aggregate size & duration per randomisation iteration
tcollbootSzDurPerIter = tcollbootAggrPerColl[,
                                             lapply(.SD, mean),
                                             by = c(lCols$bootiter),
                                             .SDcols = c('clDur', 'totSz', 'minSz', 'maxSz')]

# Further aggregate the number of events per randomisation iteration
tcollbootNcollPerIter = tcollbootAggrPerColl[,
                                             .(N = sum(!is.na(get(lCols$cltrackid)))),
                                             by = c(lCols$bootiter)]

Distribution of the number of collective events identified in every randomisation iteration compared to the observed statistics (red dashed line):

ggplot(data = tcollbootNcollPerIter,
       aes(x = N,
           y = after_stat(density))) +
  geom_histogram(binwidth = 2,
                 color="lightblue", fill="lightblue") +
  geom_vline(xintercept = nrow(tcollselAggr),
             color = "red", linetype = 2) +
  xlab('Number of collective events') +
  ggthemes::theme_few()

P-value, alternative hypothesis - the number of observed CEs is greater than the mean number of clusters in randomised data:

calcPvalFromBS(nrow(tcollselAggr), tcollbootNcollPerIter$N, corrected = T, alternative = 'greater')

Distribution of the mean cluster duration from randomisation iterations (solid black line) vs. the mean cluster duration from the non-randomised data (red dashed line):

ggplot(tcollbootSzDurPerIter,
       aes(x = clDur,
           y = after_stat(density))) +
  geom_freqpoly(color = 'black', bins = 20) +
  geom_vline(xintercept = mean(tcollselAggr$clDur), color = 'red', linetype = 'dashed') +
  xlab("Duration of collective events [#frames]") +
  ggthemes::theme_few()

P-value, alternative hypothesis - the mean observed cluster duration is greater than the mean cluster duration in randomised data:

calcPvalFromBS(mean(tcollselAggr$clDur), tcollbootSzDurPerIter$clDur, corrected = T, alternative = 'greater')

Distribution of the mean total cluster size from randomisation iterations (solid black line) vs. the mean total cluster size from the non-randomised data (red dashed line):

ggplot(tcollbootSzDurPerIter,
       aes(x = totSz,
           y = after_stat(density))) +
  geom_freqpoly(color = 'black', bins = 20) +
  geom_vline(xintercept = mean(tcollselAggr$totSz), color = 'red', linetype = 'dashed') +
  xlab("#cells involved in collective events") +
  ggthemes::theme_few()

P-value, alternative hypothesis - the mean observed total cluster size is greater than the cluster size in randomised data:

calcPvalFromBS(mean(tcollselAggr$totSz), tcollbootSzDurPerIter$totSz, corrected = T, alternative = 'greater')

Shuffle measurement per frame

Here, binarised measurements are shuffled between cells independently in every time frame. Positions remain intact; the total activity per track is affected.

stmp = system.file(sprintf('synthetic2D/output-randomisation-20/randRes_shuffMeasFrame_N%d.csv.gz', lPar$nboot),
                   package = 'ARCOS')


if (file.exists(stmp)) {
  tcollselBoot = loadDataFromFile(fname = stmp,
                                  colPos = c(lCols$posx, lCols$posy),
                                  colFrame = lCols$frame,
                                  colIDobj = lCols$trackid,
                                  colIDcoll = lCols$cltrackid,
                                  colMeas = lCols$meas,
                                  colBootIter = lCols$bootiter,
                                  fromColl = TRUE,
                                  fromBoot = TRUE)
} else {
  set.seed(11)

  tcollselBoot = ARCOS::trackCollBoot(obj = ts, 
                                      nboot = lPar$nboot, 
                                      ncores = lPar$ncores,
                                      method = 'shuffMeasFrame', 
                                      eps = lPar$db.eps, 
                                      epsPrev = lPar$db.eps, 
                                      nPrev = 1L, 
                                      minClSz = lPar$db.minPts,
                                      colldurlim = c(lPar$coll.mindur, 1000), 
                                      colltotszlim = c(lPar$coll.mintotsz, 1000),
                                      deb = F)

  #fwrite(tcollselBoot, file = sprintf('../inst/synthetic2D/output-randomisation-20/bootRes_shuffMeasFrame_N%d.csv.gz', lPar$nboot))
}

knitr::kable(head(tcollselBoot), digits = 2)

Sample noodle plot from a single randomisation iteration:

ARCOS::plotNoodle2D(tcollselBoot[get(lCols$bootiter) == 1], 
                    xfac = 1,
                    yfac = 1,
                    pos = 1,
                    tfreq = 1, 
                    style = 'point') +
  coord_cartesian(xlim = lPar$limt, ylim = lPar$limxy) +
  xlab("Time [frames]") +
  ylab("Position X [au]") +
  ggthemes::theme_few() +
  theme(legend.position = "none")

Distribution of the fraction of collective events in randomisation iterations compared to the observed fraction (red dashed line):

tFracActInColl = calcFracActInColl(ts, tcollsel)
tFracActInCollBoot = calcFracActInColl(ts, tcollselBoot)

ggplot(tFracActInCollBoot,
       aes(x = fracAct,
           y = after_stat(density))) +
  geom_freqpoly(bins = 30) +
  geom_vline(xintercept = tFracActInColl,
             color = "red", linetype = 2) +
  xlab('Fraction of activity in collective events') +
  ggthemes::theme_few()

P-value, alternative hypothesis - the fraction of total (binarised) activity involved in observed collective events is greater than the mean fraction in randomised events:

calcPvalFromBS(tFracActInColl, tFracActInCollBoot$fracAct, corrected = T, alternative = 'greater')

Aggregate statistics from iterations:

# Aggregate the results of randomisationping per collective event
tcollbootAggrPerColl = calcStatsColl(tcollselBoot)

# Further aggregate size & duration per randomisation iteration
tcollbootSzDurPerIter = tcollbootAggrPerColl[,
                                             lapply(.SD, mean),
                                             by = c(lCols$bootiter),
                                             .SDcols = c('clDur', 'totSz', 'minSz', 'maxSz')]

# Further aggregate the number of events per randomisation iteration
tcollbootNcollPerIter = tcollbootAggrPerColl[,
                                             .(N = sum(!is.na(get(lCols$cltrackid)))),
                                             by = c(lCols$bootiter)]

Distribution of the number of collective events identified in every randomisation iteration compared to the observed statistics (red dashed line):

ggplot(data = tcollbootNcollPerIter,
       aes(x = N,
           y = after_stat(density))) +
  geom_histogram(binwidth = 2,
                 color="lightblue", fill="lightblue") +
  geom_vline(xintercept = nrow(tcollselAggr),
             color = "red", linetype = 2) +
  xlab('Number of collective events') +
  ggthemes::theme_few()

P-value, alternative hypothesis - the number of observed CEs is greater than the mean number of clusters in randomised data:

calcPvalFromBS(nrow(tcollselAggr), tcollbootNcollPerIter$N, corrected = T, alternative = 'greater')

Distribution of the mean cluster duration from randomisation iterations (solid black line) vs. the mean cluster duration from the non-randomised data (red dashed line):

ggplot(tcollbootSzDurPerIter,
       aes(x = clDur,
           y = after_stat(density))) +
  geom_freqpoly(color = 'black', bins = 20) +
  geom_vline(xintercept = mean(tcollselAggr$clDur), color = 'red', linetype = 'dashed') +
  xlab("Duration of collective events [#frames]") +
  ggthemes::theme_few()

P-value, alternative hypothesis - the mean observed cluster duration is greater than the mean cluster duration in randomised data:

calcPvalFromBS(mean(tcollselAggr$clDur), tcollbootSzDurPerIter$clDur, corrected = T, alternative = 'greater')

Distribution of the mean total cluster size from randomisation iterations (solid black line) vs. the mean total cluster size from the non-randomised data (red dashed line):

ggplot(tcollbootSzDurPerIter,
       aes(x = totSz,
           y = after_stat(density))) +
  geom_freqpoly(color = 'black', bins = 20) +
  geom_vline(xintercept = mean(tcollselAggr$totSz), color = 'red', linetype = 'dashed') +
  xlab("#cells involved in collective events") +
  ggthemes::theme_few()

P-value, alternative hypothesis - the mean observed total cluster size is greater than the cluster size in randomised data:

calcPvalFromBS(mean(tcollselAggr$totSz), tcollbootSzDurPerIter$totSz, corrected = T, alternative = 'greater')

Shuffle blocks per track

In this approach, blocks of activity in binarised time series are shuffled independently in every track. Positions remain intact and the alternating order of 0 and 1 blocks is maintained.

lPar$nboot = 100
stmp = system.file(sprintf('synthetic2D/output-randomisation-20/randRes_shuffBlockTrackAlt_N%d.csv.gz', lPar$nboot),
                   package = 'ARCOS')


if (file.exists(stmp)) {
  tcollselBoot = loadDataFromFile(fname = stmp,
                                  colPos = c(lCols$posx, lCols$posy),
                                  colFrame = lCols$frame,
                                  colIDobj = lCols$trackid,
                                  colIDcoll = lCols$cltrackid,
                                  colMeas = lCols$meas,
                                  colBootIter = lCols$bootiter,
                                  fromColl = TRUE,
                                  fromBoot = TRUE)
} else {
  set.seed(11)

  tcollselBoot = ARCOS::trackCollBoot(obj = ts, 
                                      nboot = lPar$nboot, 
                                      ncores = lPar$ncores, 
                                      method = 'shuffBlockTrackAlt', 
                                      eps = lPar$db.eps, 
                                      epsPrev = lPar$db.eps, 
                                      nPrev = 1L, 
                                      minClSz = lPar$db.minPts,
                                      colldurlim = c(lPar$coll.mindur, 1000), 
                                      colltotszlim = c(lPar$coll.mintotsz, 1000),
                                      deb = F)

  #fwrite(tcollselBoot, file = sprintf('../inst/synthetic2D/output-randomisation-20/bootRes_shuffBlockTrack_N$d.csv.gz', lPar$nboot))

}

knitr::kable(head(tcollselBoot), digits = 2)

Sample noodle plot from a single randomisation iteration:

ARCOS::plotNoodle2D(tcollselBoot[get(lCols$bootiter) == 1], 
                    xfac = 1,
                    yfac = 1,
                    pos = 1,
                    tfreq = 1, 
                    style = 'point') +
  coord_cartesian(xlim = lPar$limt, ylim = lPar$limxy) +
  xlab("Time [frames]") +
  ylab("Position X [au]") +
  ggthemes::theme_few() +
  theme(legend.position = "none")

Distribution of the fraction of collective events in randomisation iterations compared to the observed fraction (red dashed line):

tFracActInColl = calcFracActInColl(ts, tcollsel)
tFracActInCollBoot = calcFracActInColl(ts, tcollselBoot)

ggplot(tFracActInCollBoot,
       aes(x = fracAct,
           y = after_stat(density))) +
  geom_freqpoly(bins = 30) +
  geom_vline(xintercept = tFracActInColl,
             color = "red", linetype = 2) +
  xlab('Fraction of activity in collective events') +
  ggthemes::theme_few()

P-value, alternative hypothesis - the fraction of total (binarised) activity involved in observed collective events is greater than the mean fraction in randomised events:

calcPvalFromBS(tFracActInColl, tFracActInCollBoot$fracAct, corrected = T, alternative = 'greater')

Aggregate statistics from iterations:

# Aggregate the results of randomisationping per collective event
tcollbootAggrPerColl = calcStatsColl(tcollselBoot)

# Further aggregate size & duration per randomisation iteration
tcollbootSzDurPerIter = tcollbootAggrPerColl[,
                                             lapply(.SD, mean),
                                             by = c(lCols$bootiter),
                                             .SDcols = c('clDur', 'totSz', 'minSz', 'maxSz')]

# Further aggregate the number of events per randomisation iteration
tcollbootNcollPerIter = tcollbootAggrPerColl[,
                                             .(N = sum(!is.na(get(lCols$cltrackid)))),
                                             by = c(lCols$bootiter)]

Distribution of the number of collective events identified in every randomisation iteration compared to the observed statistics (red dashed line):

ggplot(data = tcollbootNcollPerIter,
       aes(x = N,
           y = after_stat(density))) +
  geom_histogram(binwidth = 2,
                 color="lightblue", fill="lightblue") +
  geom_vline(xintercept = nrow(tcollselAggr),
             color = "red", linetype = 2) +
  xlab('Number of collective events') +
  ggthemes::theme_few()

P-value, alternative hypothesis - the number of observed CEs is less than the mean number of clusters in randomised data:

calcPvalFromBS(nrow(tcollselAggr), tcollbootNcollPerIter$N, corrected = T, alternative = 'less')

Distribution of the mean cluster duration from randomisation iterations (solid black line) vs. the mean cluster duration from the non-randomised data (red dashed line):

ggplot(tcollbootSzDurPerIter,
       aes(x = clDur,
           y = after_stat(density))) +
  geom_freqpoly(color = 'black', bins = 20) +
  geom_vline(xintercept = mean(tcollselAggr$clDur), color = 'red', linetype = 'dashed') +
  xlab("Duration of collective events [#frames]") +
  ggthemes::theme_few()

P-value, alternative hypothesis - the mean observed cluster duration is greater than the mean cluster duration in randomised data:

calcPvalFromBS(mean(tcollselAggr$clDur), tcollbootSzDurPerIter$clDur, corrected = T, alternative = 'greater')

Distribution of the mean total cluster size from randomisation iterations (solid black line) vs. the mean total cluster size from the non-randomised data (red dashed line):

ggplot(tcollbootSzDurPerIter,
       aes(x = totSz,
           y = after_stat(density))) +
  geom_freqpoly(color = 'black', bins = 20) +
  geom_vline(xintercept = mean(tcollselAggr$totSz), color = 'red', linetype = 'dashed') +
  xlab("#cells involved in collective events") +
  ggthemes::theme_few()

P-value, alternative hypothesis - the mean observed total cluster size is greater than the cluster size in randomised data:

calcPvalFromBS(mean(tcollselAggr$totSz), tcollbootSzDurPerIter$totSz, corrected = T, alternative = 'greater')

dmattek/ARCOS documentation built on Dec. 5, 2024, 11:02 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

dmattek/ARCOS
Automatic Recognition of Collective Signalling

In dmattek/ARCOS: Automatic Recognition of Collective Signalling

Intro

Set the parameters

Read data

Identify activity

CEs - non-randomisationped

Cluster properties

CE growth

Duration & total size

Noodle plot

CEs - randomisationed

Shuffle initial XY

Randomly shift the measurement

Shuffle time points per track

Shuffle measurement per frame

Shuffle blocks per track

R Package Documentation

Browse R Packages

We want your feedback!

dmattek/ARCOS Automatic Recognition of Collective Signalling

In dmattek/ARCOS: Automatic Recognition of Collective Signalling

Intro

Set the parameters

Read data

Identify activity

CEs - non-randomisationped

Cluster properties

CE growth

Duration & total size

Noodle plot

CEs - randomisationed

Shuffle initial XY

Randomly shift the measurement

Shuffle time points per track

Shuffle measurement per frame

Shuffle blocks per track

R Package Documentation

Browse R Packages

We want your feedback!

dmattek/ARCOS
Automatic Recognition of Collective Signalling