knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
This notebook demonstrates how to test the hypothesis that the statistics of collective events (CEs) differ from the statistics calculated in a random dataset.
The dataset used in this example was generated with NetLogo agent-based simulation that models collective activation events in a 2D epithelium. In the model, 2500 cells were placed on a 50x50 hexagonal grid. Collective events appear sparsely in the epithelium, they propagate radially from a source cell up to 3 cell rows. There is a 15% failure for a cell to propagate the wave, therefore the events are not always perfectly symmetrical. The simulation lasts 200 iterations.
Collective events are identified with ARCOS. To calculate the p value of a sample statistic, 5 different randomisationping methods are applied:
library(ARCOS) library(parallel, quietly = T) library(data.table, quietly = T) library(ggplot2, quietly = T) library(ggthemes, quietly = T) library(viridis, quietly = T) ## Definitions lPar = list() lPar$fIn = system.file('synthetic2D/Gradient20.csv.gz', package = 'ARCOS') # Parameters for tracking collective events lPar$db.eps = 1.5 # neighbourhood size; units of pixels lPar$db.minPts = 3L # minimum number of cells in a cluster lPar$trackLinkFrames = 1L # no. of subsequent frames to link during tracking; units of frames lPar$coll.mindur = 2L # minimum duration of collective events lPar$coll.mintotsz = lPar$db.minPts # minimum total size of collective events lPar$debug = T # Parameters for randomisationping lPar$ncores = 2L # number of parallel cores lPar$nboot = 100L # number of randomisationping iterations # Dataset limits for plotting lPar$limxy = c(-25, 25) lPar$limt = c(0, 220) # True size and duration of a single synthetic event lPar$sim.cltotsz = 37 lPar$sim.cldur = 9 # True maximum diameter evolution of a single event in time ttrueDiam = data.table(tevent = 0:8, diam = c(2.236068, 4.472136, rep(6.708204,7))) # Column names lCols = list(frame = 'frame', posx = 'x', posy = 'y', trackid = 'id', meas = 'meas', measresc = 'meas.resc', measbin = 'meas.bin', cltrackid = 'collid', bootiter = 'bootiter')
cat(sprintf("\nNeighbourhood size for dbscan: %.2f [px] \n\n", lPar$db.eps)) cat(sprintf("Minimum cluster size for dbscan: %d [cells] \n\n", lPar$db.minPts)) cat(sprintf("Number of subsequent frames to link: %d [frames] \n\n", lPar$trackLinkFrames)) cat(sprintf("\nMinimum duration of collective events: %d [frames] \n\n", lPar$coll.mindur)) cat(sprintf("Minimum total size of collective events: %d [cells] \n\n", lPar$coll.mintotsz)) cat(sprintf("Number of randomisationping iterations: %d \n\n", lPar$nboot))
Read a compressed CSV file, Gradient20.csv.gz
from the NetLogo simulation output. The dataset is included with the ARCOS package. The function loadDataFromFile
is a wrapper for data.table::fread
. It reads the file and sets the attributes of the ARCOS object according to parameters defined above.
# read data ts = ARCOS::loadDataFromFile(lPar$fIn, colPos = c(lCols$posx, lCols$posy), colMeas = lCols$meas, colFrame = lCols$frame, colIDobj = lCols$trackid)
The dataset is in long format where each row corresponds to the measurement and position of a single object/cell at a particular point in time:
knitr::kable(head(ts), digits = 2)
The columns have the following data:
frame
, time frames,id
, object/cell IDs,x
and y
, object coordinates,startCE
, an indication of when a collective event is initiated in the simulation,meas
, the measurement.Here we identify activity pulses in single cells by using a simple thresholding method, i.e. all measurements above 0 are considered active. The function binMeas
adds two new columns to the dataset, meas.resc
and meas.bin
with rescaled and binarised measurements, respectively.
The parameter smoothK=1
indicates that the smoothing window initially applied to the time series is a single time point, thus no smoothing is performed. The result of thresholding is equivalent to calculating meas > 0
.
# binarise the measurement ARCOS::binMeas(ts, biasMet = "none", smoothK = 1, binThr = 0.)
knitr::kable(head(ts), digits = 2)
The results of activity detection for a sample of long tracks:
ARCOS::plotBinMeas(ts, ntraj = 25, inSeed = 11, measfac = 1., xfac = 1, plotResc = F) + coord_cartesian(ylim = c(0,2)) + theme_few() + xlab("Time [frames]") + ylab("ERK activity")
Identify collective events in the original data only for active cells, which in our case are cells with the measurement above 0.
The function trackColl
returns a data table in long format only with objects/cells that participate in collective events. Two additional columns are added, collid.frame
and collid
, with IDs of collective events per frame and a globally unique ID, respectively. The latter, collid
is what we will use in further analysis.
# Perform tracking of collective events. # Pass single-cell data that has a binarised measurement greater than 1. tcoll = ARCOS::trackColl(ts[get(lCols$measbin) > 0], eps = lPar$db.eps, minClSz = lPar$db.minPts) # Filter collective events based on size and duration tcollsel = ARCOS::selColl(tcoll, colldur = c(lPar$coll.mindur, Inf), colltotsz = c(lPar$coll.mintotsz, Inf))
knitr::kable(head(tcollsel), digits = 2)
Evolution of CEs' growth quantified by the maximum Feret diameter (longer side of a minimal oriented bounding box). Red dots indicate the true evolution of a single synthetic event. Note horizontal shifts to avoid overplotting!
We observe that the majority of collective events grow as a prototypical synthetic event with only few events that depart this dynamics due to overlaps with other events.
# Calculate the minimal bounding box tcollselDiam = calcGrowthColl(tcollsel)
The function calcGrowthColl
adds the width, height and the time of collective events' evolution. The longer of the width and height is taken as the diameter of the event.
knitr::kable(head(tcollselDiam), digits = 2)
ggplot(tcollselDiam, aes(x = tevent, y = diam, group = get(lCols$cltrackid))) + geom_line(position = position_dodge(.5), alpha = 0.5) + geom_point(data = ttrueDiam, aes(x = tevent, y = diam, group = 1), colour = 'red') + xlab("Time [frames]") + ylab('Diameter [au]') + theme_minimal()
We extract two statistics of collective events: event duration and the total number of unique cells involved in an event (~ size). Dashed lines indicate true size & duration of synthetic events. Several bigger and longer-lasting than synthetic events are due to overlap with other events.
# Calculate duration of collective events in frames tcollselAggr = ARCOS::calcStatsColl(tcollsel) ggplot(tcollselAggr, aes(x = totSz, y = clDur)) + geom_hex() + viridis::scale_fill_viridis(discrete = F, trans = "log10") + geom_vline(xintercept = lPar$sim.cltotsz, color = "red", linetype = 2) + geom_hline(yintercept = lPar$sim.cldur, color = "red", linetype = 2) + xlab("#cells involved in collective events") + ylab("Duration of collective events [#frames]") + theme_bw()
X/T projection of individual cells that participate in collective events. Tracks are coloured by the IS of collective events. Red dots indicate the beginning of a spontaneous collective event in the simulation.
ARCOS::plotNoodle2D(tcollsel, xfac = 1, yfac = 1, pos = 1, tfreq = 1, style = 'line') + geom_point(data = ts[startCE == 1], aes(x = get(lCols$frame), y = get(lCols$posx), group = 1), color = 'red', alpha = 0.8, size = 1) + xlab("Time [frames]") + ylab("Position X [au]") + theme_minimal() + theme(legend.position = "none")
cat(sprintf("Number of initiated events: %d\n\n", sum(ts$startCE))) cat(sprintf("Number of detected events: %d\n\n", nrow(tcollselAggr))) cat(sprintf("Average binarised activity level: %.4f\n\n", mean(ts[[lCols$measbin]]))) cat(sprintf("Fraction of activity in (filtered) collective events: %.4f\n\n", calcFracActInColl(ts, tcollsel)))
Calculate p-values for the two statistics (duration and size) using randomisationping. Each calculation can take up to several minutes!
In this approach, the initial position of every track is shuffled between all cells in the system. Since the measurement dynamics remains intact, this approach randomises only the spatial aspect of collective phenomena, i.e. tracks are only rearranged in space while retaining their measurement dynamics. One caveat concerns cases where cells/objects are moving in space. By assigning new starting points, objects' positions may exceed the initial bounds of the field of view.
The ARCOS::trackCollBoot
function outputs a data.table
in long format with time points when individual objects participate in collective events. From this output, further statistics can be extracted, such as: duration, size, diameter, number of collective events. The bootiter
column indexes randomisationping iterations.
# Perform randomisation or load data from a file stmp = system.file(sprintf('synthetic2D/output-randomisation-20/randRes_shuffIniXY_N%d.csv.gz', lPar$nboot), package = 'ARCOS') if (file.exists(stmp)) { tcollselBoot = ARCOS::loadDataFromFile(fname = stmp, colPos = c(lCols$posx, lCols$posy), colFrame = lCols$frame, colIDobj = lCols$trackid, colIDcoll = lCols$cltrackid, colBootIter = lCols$bootiter, fromColl = TRUE, fromBoot = TRUE) } else { set.seed(11) tcollselBoot = ARCOS::trackCollBoot(obj = ts, nboot = lPar$nboot, ncores = lPar$ncores, method = 'shuffCoord', eps = lPar$db.eps, epsPrev = lPar$db.eps, nPrev = 1L, minClSz = lPar$db.minPts, colldurlim = c(lPar$coll.mindur, 1000), colltotszlim = c(lPar$coll.mintotsz, 1000), deb = F) #fwrite(tcollselBoot, file = sprintf('../inst/synthetic2D/output-randomisation-20/bootRes_shuffIniXY_N%d.csv.gz', lPar$nboot)) }
knitr::kable(head(tcollselBoot), digits = 2)
Sample noodle plot from a single randomisation iteration:
ARCOS::plotNoodle2D(tcollselBoot[get(lCols$bootiter) == 1], xfac = 1, yfac = 1, pos = 1, tfreq = 1, style = 'point') + coord_cartesian(xlim = lPar$limt, ylim = lPar$limxy) + xlab("Time [frames]") + ylab("Position X [au]") + ggthemes::theme_few() + theme(legend.position = "none")
Distribution of the fraction of collective events in randomisation iterations compared to the observed fraction (red dashed line):
tFracActInColl = calcFracActInColl(ts, tcollsel) tFracActInCollBoot = calcFracActInColl(ts, tcollselBoot)
ggplot(tFracActInCollBoot, aes(x = fracAct, y = after_stat(density))) + geom_freqpoly(bins = 30) + geom_vline(xintercept = tFracActInColl, color = "red", linetype = 2) + xlab('Fraction of activity in collective events') + ggthemes::theme_few()
P-value, alternative hypothesis - the fraction of total (binarised) activity involved in observed collective events is greater than the mean fraction in randomised events:
calcPvalFromBS(tFracActInColl, tFracActInCollBoot$fracAct, corrected = T, alternative = 'greater')
Aggregate statistics from iterations:
# Aggregate the results of randomisationping per collective event tcollbootAggrPerColl = calcStatsColl(tcollselBoot) # Further aggregate size & duration per randomisation iteration tcollbootSzDurPerIter = tcollbootAggrPerColl[, lapply(.SD, mean), by = c(lCols$bootiter), .SDcols = c('clDur', 'totSz', 'minSz', 'maxSz')] # Further aggregate the number of events per randomisation iteration tcollbootNcollPerIter = tcollbootAggrPerColl[, .(N = sum(!is.na(get(lCols$cltrackid)))), by = c(lCols$bootiter)]
Distribution of the number of collective events identified in every randomisation iteration compared to the observed statistics (red dashed line):
ggplot(data = tcollbootNcollPerIter, aes(x = N, y = after_stat(density))) + geom_histogram(binwidth = 2, color="lightblue", fill="lightblue") + geom_vline(xintercept = nrow(tcollselAggr), color = "red", linetype = 2) + xlab('Number of collective events') + ggthemes::theme_few()
P-value, alternative hypothesis - the number of observed CEs is less than the mean number of clusters in randomised data:
calcPvalFromBS(nrow(tcollselAggr), tcollbootNcollPerIter$N, corrected = T, alternative = 'less')
P-value, alternative hypothesis - the number of observed CEs is greater than the mean number of clusters in randomised data:
calcPvalFromBS(nrow(tcollselAggr), tcollbootNcollPerIter$N, corrected = T, alternative = 'greater')
Distribution of the mean cluster duration from randomisation iterations (solid black line) vs. the mean cluster duration from the non-randomised data (red dashed line):
ggplot(tcollbootSzDurPerIter, aes(x = clDur, y = after_stat(density))) + geom_freqpoly(color = 'black', bins = 20) + geom_vline(xintercept = mean(tcollselAggr$clDur), color = 'red', linetype = 'dashed') + xlab("Duration of collective events [#frames]") + ggthemes::theme_few()
P-value, alternative hypothesis - the mean observed cluster duration is greater than the mean cluster duration in randomised data:
calcPvalFromBS(mean(tcollselAggr$clDur), tcollbootSzDurPerIter$clDur, corrected = T, alternative = 'greater')
Distribution of the mean total cluster size from randomisation iterations (solid black line) vs. the mean total cluster size from the non-randomised data (red dashed line):
ggplot(tcollbootSzDurPerIter, aes(x = totSz, y = after_stat(density))) + geom_freqpoly(color = 'black', bins = 20) + geom_vline(xintercept = mean(tcollselAggr$totSz), color = 'red', linetype = 'dashed') + xlab("#cells involved in collective events") + ggthemes::theme_few()
P-value, alternative hypothesis - the mean observed total cluster size is greater than the cluster size in randomised data:
calcPvalFromBS(mean(tcollselAggr$totSz), tcollbootSzDurPerIter$totSz, corrected = T, alternative = 'greater')
Here, entire single-cell tracks are shifted forward or backward in time. The spatial arrangement of cells and the measurement dynamics are intact, except activation events occur at different times than originally.
stmp = system.file(sprintf('synthetic2D/output-randomisation-20/randRes_randShiftMeas_N%d.csv.gz', lPar$nboot), package = 'ARCOS') if (file.exists(stmp)) { tcollselBoot = loadDataFromFile(fname = stmp, colPos = c(lCols$posx, lCols$posy), colFrame = lCols$frame, colIDobj = lCols$trackid, colIDcoll = lCols$cltrackid, colMeas = lCols$meas, colBootIter = lCols$bootiter, fromColl = TRUE, fromBoot = TRUE) } else { set.seed(11) tcollselBoot = ARCOS::trackCollBoot(obj = ts, nboot = lPar$nboot, ncores = lPar$ncores, method = 'randShiftMeas', eps = lPar$db.eps, epsPrev = lPar$db.eps, nPrev = 1L, minClSz = lPar$db.minPts, colldurlim = c(lPar$coll.mindur, 1000), colltotszlim = c(lPar$coll.mintotsz, 1000), deb = F) #fwrite(tcollselBoot, file = sprintf('../inst/synthetic2D/output-randomisation-20/bootRes_randShiftMeas_N%d.csv.gz', lPar$nboot)) }
knitr::kable(head(tcollselBoot), digits = 2)
Sample noodle plot from a single randomisation iteration:
ARCOS::plotNoodle2D(tcollselBoot[get(lCols$bootiter) == 1], xfac = 1, yfac = 1, pos = 1, tfreq = 1, style = 'point') + coord_cartesian(xlim = lPar$limt, ylim = lPar$limxy) + xlab("Time [frames]") + ylab("Position X [au]") + ggthemes::theme_few() + theme(legend.position = "none")
Distribution of the fraction of collective events in randomisation iterations compared to the observed fraction (red dashed line):
tFracActInColl = calcFracActInColl(ts, tcollsel) tFracActInCollBoot = calcFracActInColl(ts, tcollselBoot)
ggplot(tFracActInCollBoot, aes(x = fracAct, y = after_stat(density))) + geom_freqpoly(bins = 30) + geom_vline(xintercept = tFracActInColl, color = "red", linetype = 2) + xlab('Fraction of activity in collective events') + ggthemes::theme_few()
P-value, alternative hypothesis - the fraction of total (binarised) activity involved in observed collective events is greater than the mean fraction in randomised events:
calcPvalFromBS(tFracActInColl, tFracActInCollBoot$fracAct, corrected = T, alternative = 'greater')
Aggregate statistics from iterations:
# Aggregate the results of randomisationping per collective event tcollbootAggrPerColl = calcStatsColl(tcollselBoot) # Further aggregate size & duration per randomisation iteration tcollbootSzDurPerIter = tcollbootAggrPerColl[, lapply(.SD, mean), by = c(lCols$bootiter), .SDcols = c('clDur', 'totSz', 'minSz', 'maxSz')] # Further aggregate the number of events per randomisation iteration tcollbootNcollPerIter = tcollbootAggrPerColl[, .(N = sum(!is.na(get(lCols$cltrackid)))), by = c(lCols$bootiter)]
Distribution of the number of collective events identified in every randomisation iteration compared to the observed statistics (red dashed line):
ggplot(data = tcollbootNcollPerIter, aes(x = N, y = after_stat(density))) + geom_histogram(binwidth = 2, color="lightblue", fill="lightblue") + geom_vline(xintercept = nrow(tcollselAggr), color = "red", linetype = 2) + xlab('Number of collective events') + ggthemes::theme_few()
P-value, alternative hypothesis - the number of observed CEs is less than the mean number of clusters in randomised data:
calcPvalFromBS(nrow(tcollselAggr), tcollbootNcollPerIter$N, corrected = T, alternative = 'less')
Distribution of the mean cluster duration from randomisation iterations (solid black line) vs. the mean cluster duration from the non-randomised data (red dashed line):
ggplot(tcollbootSzDurPerIter, aes(x = clDur, y = after_stat(density))) + geom_freqpoly(color = 'black', bins = 20) + geom_vline(xintercept = mean(tcollselAggr$clDur), color = 'red', linetype = 'dashed') + xlab("Duration of collective events [#frames]") + ggthemes::theme_few()
P-value, alternative hypothesis - the mean observed cluster duration is greater than the mean cluster duration in randomised data:
calcPvalFromBS(mean(tcollselAggr$clDur), tcollbootSzDurPerIter$clDur, corrected = T, alternative = 'greater')
Distribution of the mean total cluster size from randomisation iterations (solid black line) vs. the mean total cluster size from the non-randomised data (red dashed line):
ggplot(tcollbootSzDurPerIter, aes(x = totSz, y = after_stat(density))) + geom_freqpoly(color = 'black', bins = 20) + geom_vline(xintercept = mean(tcollselAggr$totSz), color = 'red', linetype = 'dashed') + xlab("#cells involved in collective events") + ggthemes::theme_few()
P-value, alternative hypothesis - the mean observed total cluster size is greater than the cluster size in randomised data:
calcPvalFromBS(mean(tcollselAggr$totSz), tcollbootSzDurPerIter$totSz, corrected = T, alternative = 'greater')
Here, individual time points are shuffled independently in each track. While the spatial arrangement is intact, the measurement dynamics is completely randomised.
stmp = system.file(sprintf('synthetic2D/output-randomisation-20/randRes_shuffMeasTrack_N%d.csv.gz', lPar$nboot), package = 'ARCOS') if (file.exists(stmp)) { tcollselBoot = loadDataFromFile(fname = stmp, colPos = c(lCols$posx, lCols$posy), colFrame = lCols$frame, colIDobj = lCols$trackid, colIDcoll = lCols$cltrackid, colMeas = lCols$meas, colBootIter = lCols$bootiter, fromColl = TRUE, fromBoot = TRUE) } else { set.seed(11) tcollselBoot = ARCOS::trackCollBoot(obj = ts, nboot = lPar$nboot, ncores = lPar$ncores, method = 'shuffMeasTrack', eps = lPar$db.eps, epsPrev = lPar$db.eps, nPrev = 1L, minClSz = lPar$db.minPts, colldurlim = c(lPar$coll.mindur, 1000), colltotszlim = c(lPar$coll.mintotsz, 1000), deb = F) #fwrite(tcollselBoot, file = sprintf('../inst/synthetic2D/output-randomisation-20/bootRes_shuffMeasTrack_N%d.csv.gz', lPar$nboot)) }
knitr::kable(head(tcollselBoot), digits = 2)
Sample noodle plot from a single randomisation iteration:
ARCOS::plotNoodle2D(tcollselBoot[get(lCols$bootiter) == 1], xfac = 1, yfac = 1, pos = 1, tfreq = 1, style = 'point') + coord_cartesian(xlim = lPar$limt, ylim = lPar$limxy) + xlab("Time [frames]") + ylab("Position X [au]") + ggthemes::theme_few() + theme(legend.position = "none")
Distribution of the fraction of collective events in randomisation iterations compared to the observed fraction (red dashed line):
tFracActInColl = calcFracActInColl(ts, tcollsel) tFracActInCollBoot = calcFracActInColl(ts, tcollselBoot)
ggplot(tFracActInCollBoot, aes(x = fracAct, y = after_stat(density))) + geom_freqpoly(bins = 30) + geom_vline(xintercept = tFracActInColl, color = "red", linetype = 2) + xlab('Fraction of activity in collective events') + ggthemes::theme_few()
P-value, alternative hypothesis - the fraction of total (binarised) activity involved in observed collective events is greater than the mean fraction in randomised events:
calcPvalFromBS(tFracActInColl, tFracActInCollBoot$fracAct, corrected = T, alternative = 'greater')
Aggregate statistics from iterations:
# Aggregate the results of randomisationping per collective event tcollbootAggrPerColl = calcStatsColl(tcollselBoot) # Further aggregate size & duration per randomisation iteration tcollbootSzDurPerIter = tcollbootAggrPerColl[, lapply(.SD, mean), by = c(lCols$bootiter), .SDcols = c('clDur', 'totSz', 'minSz', 'maxSz')] # Further aggregate the number of events per randomisation iteration tcollbootNcollPerIter = tcollbootAggrPerColl[, .(N = sum(!is.na(get(lCols$cltrackid)))), by = c(lCols$bootiter)]
Distribution of the number of collective events identified in every randomisation iteration compared to the observed statistics (red dashed line):
ggplot(data = tcollbootNcollPerIter, aes(x = N, y = after_stat(density))) + geom_histogram(binwidth = 2, color="lightblue", fill="lightblue") + geom_vline(xintercept = nrow(tcollselAggr), color = "red", linetype = 2) + xlab('Number of collective events') + ggthemes::theme_few()
P-value, alternative hypothesis - the number of observed CEs is greater than the mean number of clusters in randomised data:
calcPvalFromBS(nrow(tcollselAggr), tcollbootNcollPerIter$N, corrected = T, alternative = 'greater')
Distribution of the mean cluster duration from randomisation iterations (solid black line) vs. the mean cluster duration from the non-randomised data (red dashed line):
ggplot(tcollbootSzDurPerIter, aes(x = clDur, y = after_stat(density))) + geom_freqpoly(color = 'black', bins = 20) + geom_vline(xintercept = mean(tcollselAggr$clDur), color = 'red', linetype = 'dashed') + xlab("Duration of collective events [#frames]") + ggthemes::theme_few()
P-value, alternative hypothesis - the mean observed cluster duration is greater than the mean cluster duration in randomised data:
calcPvalFromBS(mean(tcollselAggr$clDur), tcollbootSzDurPerIter$clDur, corrected = T, alternative = 'greater')
Distribution of the mean total cluster size from randomisation iterations (solid black line) vs. the mean total cluster size from the non-randomised data (red dashed line):
ggplot(tcollbootSzDurPerIter, aes(x = totSz, y = after_stat(density))) + geom_freqpoly(color = 'black', bins = 20) + geom_vline(xintercept = mean(tcollselAggr$totSz), color = 'red', linetype = 'dashed') + xlab("#cells involved in collective events") + ggthemes::theme_few()
P-value, alternative hypothesis - the mean observed total cluster size is greater than the cluster size in randomised data:
calcPvalFromBS(mean(tcollselAggr$totSz), tcollbootSzDurPerIter$totSz, corrected = T, alternative = 'greater')
Here, binarised measurements are shuffled between cells independently in every time frame. Positions remain intact; the total activity per track is affected.
stmp = system.file(sprintf('synthetic2D/output-randomisation-20/randRes_shuffMeasFrame_N%d.csv.gz', lPar$nboot), package = 'ARCOS') if (file.exists(stmp)) { tcollselBoot = loadDataFromFile(fname = stmp, colPos = c(lCols$posx, lCols$posy), colFrame = lCols$frame, colIDobj = lCols$trackid, colIDcoll = lCols$cltrackid, colMeas = lCols$meas, colBootIter = lCols$bootiter, fromColl = TRUE, fromBoot = TRUE) } else { set.seed(11) tcollselBoot = ARCOS::trackCollBoot(obj = ts, nboot = lPar$nboot, ncores = lPar$ncores, method = 'shuffMeasFrame', eps = lPar$db.eps, epsPrev = lPar$db.eps, nPrev = 1L, minClSz = lPar$db.minPts, colldurlim = c(lPar$coll.mindur, 1000), colltotszlim = c(lPar$coll.mintotsz, 1000), deb = F) #fwrite(tcollselBoot, file = sprintf('../inst/synthetic2D/output-randomisation-20/bootRes_shuffMeasFrame_N%d.csv.gz', lPar$nboot)) }
knitr::kable(head(tcollselBoot), digits = 2)
Sample noodle plot from a single randomisation iteration:
ARCOS::plotNoodle2D(tcollselBoot[get(lCols$bootiter) == 1], xfac = 1, yfac = 1, pos = 1, tfreq = 1, style = 'point') + coord_cartesian(xlim = lPar$limt, ylim = lPar$limxy) + xlab("Time [frames]") + ylab("Position X [au]") + ggthemes::theme_few() + theme(legend.position = "none")
Distribution of the fraction of collective events in randomisation iterations compared to the observed fraction (red dashed line):
tFracActInColl = calcFracActInColl(ts, tcollsel) tFracActInCollBoot = calcFracActInColl(ts, tcollselBoot)
ggplot(tFracActInCollBoot, aes(x = fracAct, y = after_stat(density))) + geom_freqpoly(bins = 30) + geom_vline(xintercept = tFracActInColl, color = "red", linetype = 2) + xlab('Fraction of activity in collective events') + ggthemes::theme_few()
P-value, alternative hypothesis - the fraction of total (binarised) activity involved in observed collective events is greater than the mean fraction in randomised events:
calcPvalFromBS(tFracActInColl, tFracActInCollBoot$fracAct, corrected = T, alternative = 'greater')
Aggregate statistics from iterations:
# Aggregate the results of randomisationping per collective event tcollbootAggrPerColl = calcStatsColl(tcollselBoot) # Further aggregate size & duration per randomisation iteration tcollbootSzDurPerIter = tcollbootAggrPerColl[, lapply(.SD, mean), by = c(lCols$bootiter), .SDcols = c('clDur', 'totSz', 'minSz', 'maxSz')] # Further aggregate the number of events per randomisation iteration tcollbootNcollPerIter = tcollbootAggrPerColl[, .(N = sum(!is.na(get(lCols$cltrackid)))), by = c(lCols$bootiter)]
Distribution of the number of collective events identified in every randomisation iteration compared to the observed statistics (red dashed line):
ggplot(data = tcollbootNcollPerIter, aes(x = N, y = after_stat(density))) + geom_histogram(binwidth = 2, color="lightblue", fill="lightblue") + geom_vline(xintercept = nrow(tcollselAggr), color = "red", linetype = 2) + xlab('Number of collective events') + ggthemes::theme_few()
P-value, alternative hypothesis - the number of observed CEs is greater than the mean number of clusters in randomised data:
calcPvalFromBS(nrow(tcollselAggr), tcollbootNcollPerIter$N, corrected = T, alternative = 'greater')
Distribution of the mean cluster duration from randomisation iterations (solid black line) vs. the mean cluster duration from the non-randomised data (red dashed line):
ggplot(tcollbootSzDurPerIter, aes(x = clDur, y = after_stat(density))) + geom_freqpoly(color = 'black', bins = 20) + geom_vline(xintercept = mean(tcollselAggr$clDur), color = 'red', linetype = 'dashed') + xlab("Duration of collective events [#frames]") + ggthemes::theme_few()
P-value, alternative hypothesis - the mean observed cluster duration is greater than the mean cluster duration in randomised data:
calcPvalFromBS(mean(tcollselAggr$clDur), tcollbootSzDurPerIter$clDur, corrected = T, alternative = 'greater')
Distribution of the mean total cluster size from randomisation iterations (solid black line) vs. the mean total cluster size from the non-randomised data (red dashed line):
ggplot(tcollbootSzDurPerIter, aes(x = totSz, y = after_stat(density))) + geom_freqpoly(color = 'black', bins = 20) + geom_vline(xintercept = mean(tcollselAggr$totSz), color = 'red', linetype = 'dashed') + xlab("#cells involved in collective events") + ggthemes::theme_few()
P-value, alternative hypothesis - the mean observed total cluster size is greater than the cluster size in randomised data:
calcPvalFromBS(mean(tcollselAggr$totSz), tcollbootSzDurPerIter$totSz, corrected = T, alternative = 'greater')
In this approach, blocks of activity in binarised time series are shuffled independently in every track. Positions remain intact and the alternating order of 0 and 1 blocks is maintained.
lPar$nboot = 100 stmp = system.file(sprintf('synthetic2D/output-randomisation-20/randRes_shuffBlockTrackAlt_N%d.csv.gz', lPar$nboot), package = 'ARCOS') if (file.exists(stmp)) { tcollselBoot = loadDataFromFile(fname = stmp, colPos = c(lCols$posx, lCols$posy), colFrame = lCols$frame, colIDobj = lCols$trackid, colIDcoll = lCols$cltrackid, colMeas = lCols$meas, colBootIter = lCols$bootiter, fromColl = TRUE, fromBoot = TRUE) } else { set.seed(11) tcollselBoot = ARCOS::trackCollBoot(obj = ts, nboot = lPar$nboot, ncores = lPar$ncores, method = 'shuffBlockTrackAlt', eps = lPar$db.eps, epsPrev = lPar$db.eps, nPrev = 1L, minClSz = lPar$db.minPts, colldurlim = c(lPar$coll.mindur, 1000), colltotszlim = c(lPar$coll.mintotsz, 1000), deb = F) #fwrite(tcollselBoot, file = sprintf('../inst/synthetic2D/output-randomisation-20/bootRes_shuffBlockTrack_N$d.csv.gz', lPar$nboot)) }
knitr::kable(head(tcollselBoot), digits = 2)
Sample noodle plot from a single randomisation iteration:
ARCOS::plotNoodle2D(tcollselBoot[get(lCols$bootiter) == 1], xfac = 1, yfac = 1, pos = 1, tfreq = 1, style = 'point') + coord_cartesian(xlim = lPar$limt, ylim = lPar$limxy) + xlab("Time [frames]") + ylab("Position X [au]") + ggthemes::theme_few() + theme(legend.position = "none")
Distribution of the fraction of collective events in randomisation iterations compared to the observed fraction (red dashed line):
tFracActInColl = calcFracActInColl(ts, tcollsel) tFracActInCollBoot = calcFracActInColl(ts, tcollselBoot)
ggplot(tFracActInCollBoot, aes(x = fracAct, y = after_stat(density))) + geom_freqpoly(bins = 30) + geom_vline(xintercept = tFracActInColl, color = "red", linetype = 2) + xlab('Fraction of activity in collective events') + ggthemes::theme_few()
P-value, alternative hypothesis - the fraction of total (binarised) activity involved in observed collective events is greater than the mean fraction in randomised events:
calcPvalFromBS(tFracActInColl, tFracActInCollBoot$fracAct, corrected = T, alternative = 'greater')
Aggregate statistics from iterations:
# Aggregate the results of randomisationping per collective event tcollbootAggrPerColl = calcStatsColl(tcollselBoot) # Further aggregate size & duration per randomisation iteration tcollbootSzDurPerIter = tcollbootAggrPerColl[, lapply(.SD, mean), by = c(lCols$bootiter), .SDcols = c('clDur', 'totSz', 'minSz', 'maxSz')] # Further aggregate the number of events per randomisation iteration tcollbootNcollPerIter = tcollbootAggrPerColl[, .(N = sum(!is.na(get(lCols$cltrackid)))), by = c(lCols$bootiter)]
Distribution of the number of collective events identified in every randomisation iteration compared to the observed statistics (red dashed line):
ggplot(data = tcollbootNcollPerIter, aes(x = N, y = after_stat(density))) + geom_histogram(binwidth = 2, color="lightblue", fill="lightblue") + geom_vline(xintercept = nrow(tcollselAggr), color = "red", linetype = 2) + xlab('Number of collective events') + ggthemes::theme_few()
P-value, alternative hypothesis - the number of observed CEs is less than the mean number of clusters in randomised data:
calcPvalFromBS(nrow(tcollselAggr), tcollbootNcollPerIter$N, corrected = T, alternative = 'less')
Distribution of the mean cluster duration from randomisation iterations (solid black line) vs. the mean cluster duration from the non-randomised data (red dashed line):
ggplot(tcollbootSzDurPerIter, aes(x = clDur, y = after_stat(density))) + geom_freqpoly(color = 'black', bins = 20) + geom_vline(xintercept = mean(tcollselAggr$clDur), color = 'red', linetype = 'dashed') + xlab("Duration of collective events [#frames]") + ggthemes::theme_few()
P-value, alternative hypothesis - the mean observed cluster duration is greater than the mean cluster duration in randomised data:
calcPvalFromBS(mean(tcollselAggr$clDur), tcollbootSzDurPerIter$clDur, corrected = T, alternative = 'greater')
Distribution of the mean total cluster size from randomisation iterations (solid black line) vs. the mean total cluster size from the non-randomised data (red dashed line):
ggplot(tcollbootSzDurPerIter, aes(x = totSz, y = after_stat(density))) + geom_freqpoly(color = 'black', bins = 20) + geom_vline(xintercept = mean(tcollselAggr$totSz), color = 'red', linetype = 'dashed') + xlab("#cells involved in collective events") + ggthemes::theme_few()
P-value, alternative hypothesis - the mean observed total cluster size is greater than the cluster size in randomised data:
calcPvalFromBS(mean(tcollselAggr$totSz), tcollbootSzDurPerIter$totSz, corrected = T, alternative = 'greater')
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.