Fluo_CV_modeling: Fluo_CV_modeling
In CONFESS: Cell OrderiNg by FluorEScence Signal

Description Usage Arguments Value Examples

It performs the cross-validation analysis on the estimated pseudotimes and clusters of the previous step, i.e. Fluo_CV_prep() or a manually generated list based on Fluo_modeling(). This function will evaluate the change in the estimated obtained (i) from a subset of data by f-fold cross-validation where f is the percentage of the samples from a specific group (@GAPgroups) that stay in the analysis at each CV iteration, or (ii) from a subset of runs that stay in the analysis at each CV iteration. It produces informative plots for the differences in the estimates between each iteration and the original estimates. It also summarizes the CV-estimated pseudotimes into a new set of estimates.

1
2
3

Fluo_CV_modeling(data, B = 20, batch = 1, perc.cutoff = 0.6,
  q = 0.9, f = 0.9, seed.it = TRUE, pseudotime.cutoff = 20,
  savePlot = getwd())

`data`	List. The output of Fluo_CV_prep() or any other manually retrieved list with the components of Fluo_CV_prep().
`B`	Integer. The number of cross-validation to be performed. Default is 20.
`batch`	Numeric. A vector of runs to remain in the cross-validation. The rest are temporarily removed. The algorithm estimates the centroids of the reduced data and then calls the out-of-bag samples and re-estimates their k-mean clusters.
`perc.cutoff`	Float. The percentage of similar CV-estimated pseudotimes for each sample. The similarity is assessed by k-means with k = 2. It serves as a cut-off to identify outlying CV-estimated pseudotimes (along with q and pseudotime.cutoff). Default is 0.6.
`q`	Float. The q-th quantile of the difference between the original data estimated pseudotimes and the CV-estimated pseudotimes for each sample. It serves as a cut-off to identify outlying CV-estimated pseudotimes (along with perc.cutoff and pseudotime.cutoff). Default is 0.9.
`f`	Float. The percentage of samples from each estimated cluster (@GAPgroups) to remain in the cross-validation analysis. The rest are temporarily removed. The algorithm estimates the centroids of the reduced data and then calls the out-of-bag samples and re-estimates their k-mean clusters.
`seed.it`	Logical. If TRUE it performs cross-validation with the seed used in the analysis of the original data, i.e. in Fluo_CV_prep(). Default is TRUE.
`pseudotime.cutoff`	Integer. A user-defined value to define outlier samples (along with perc.cutoff and q), i.e. samples with Pseudotime(original) - medianPseudotime(CV) > pseudotime.cutoff. Default is 20.
`savePlot`	Character string. Directory to store the plots of the analysis of the whole data. Its value can be an existing directory or "screen" that prints the plot only on the screen. The "OFF" option is permanently used in cross-validations). Default is the current working directory, getwd().

The output of Fluo_modeling() with the original estimates and the CV-based estimated pseudotimes/clusters in different slots of component CV results. The results are categorized by run number. Each run contains the original estimates (@Original Pseudotimes), the CV-based estimates by the "median/original" method (@Reest.Pseudotimes_median/original) and the CV-based estimates by the "median/null" method (@Reest.Pseudotimes_median/null).

1. "median/original" It integrates the information of the CV and the originally estimated pseudotimes. It build kmean clusters of the B CV estimates for each sample and defines pseudotime(i) = median(pseudotime(set1,i)) where set1 is a subset of the B pseudotimes that exhibit some similarity. The similarity is assessed by k-means clustering. This subset should contain a large percentage of the B data (>perc.cutoff) and it's median should be lower than the q-th quantile of the average differences between the original and the CV-estimated pseudotimes across all samples. If the CV estimated pseudotimes do not satisfy the above then the algorithm returns pseudotime(i) = median(pseudotime(set2,i)) where set2 is the cluster of B pseudotimes that minimizes |median(pseudotimes(set2,i))-original.pseudotimes|.

2. "median/null" if set1 with similar pseudotimes that satisfies the above rules exists, it returns the pseudotime(i) = median(pseudotime(set1,i)). Otherwise it returns NULL, i.e. the sample CV-estimated pseudotimes are not similar and the algorithm cannot estimate reliably the pseudotime of interest.

Both solutions are then going under a final round of change-point analysis that uses the CV-estimated pseudotimes and produce the final results of Fluo_CV_modeling(). All results canbe subsequently used in Fluo_ordering(). The output also includes a second component, @All.Progressions, with the original and the CV estimated pseudotimes. This information is kept for comparison reasons and it is not used further.

print("Not run because takes a long time")
#step1 <- createFluo(from.file=system.file("extdata", "Results_of_image_analysis.txt",
#package = "CONFESS"),separator="_")
#steps2_4 <- Fluo_CV_prep(data=step1,init.path = "bottom/left",path.type=c("circular","clockwise"),
#single.batch.analysis = 5,flex.reps=5,altFUN="kmeans",VSmethod="DDHFmv",CPmethod="ECP",
#B.kmeans=5,CPpvalue=0.01,savePlot="OFF")
#steps2_4cv<-Fluo_CV_modeling(data=steps2_4,B=5,f=0.99,savePlot="OFF")