Fluo_CV_prep: Fluo_CV_prep
In CONFESS: Cell OrderiNg by FluorEScence Signal

Description Usage Arguments Details Value Examples

It generates the data that will be used in the cross-validation analysis. Essentialy, it analyzes and stores the original (full) dataset for different reference runs, seeds, starting clusters etc. It estimates the progression path automatically that is feasible only for standard paths (path.type parameter different than 'other'). For this reason this function is useful only in these cases. If otherwise, it should be ommitted from the analysis and the user is should generate it manually, i.e. run Fluo_adjustment() - Fluo_modeling() series as many times as the cases to be studied with manual init.path input in Fluo_modeling().

Fluo_CV_prep(data, init.path = "bottom/left", path.type = c("circular",
  "clockwise"), BGmethod = "normexp", maxMix = 3,
  single.batch.analysis = 1:5, transformation = "log",
  prior.pi = 0.1, flex.reps = 50, flexmethod = "BIC", areacut = 0,
  fixClusters = 0, altFUN = "kmeans", k.max = 15,
  VSmethod = "DDHFmv", CPmethod = "ECP", CPgroups = 5,
  B.kmeans = 50, CPpvalue = 0.05, CPmingroup = 15,
  savePlot = getwd(), seed = NULL)

`data`	List. The output of crearteFluo(), i.e. the image analysis estimates.
`init.path`	Character vector. It defines the starting cluster of the progression path in general terms. It can be one of "top/right", "top/left", "bottom/right" or "bottom/left" indicating the cluster of interest on the 2d scatterplot of Fluo_inspection(). Default is rep("bottom/left",2), i.e. in Fucci an EM/earlyG1 like cluster.
`path.type`	Character vector. A user-defined vector that characterizes the cell progression dynamics. The first element can be either "circular" or "A2Z" or "other". If "circular" the path progression is assummed to exhibit a circle-like behavior. If "A2Z" the path is assumed to have a well-defined start and a well-defined end point (e.g. a linear progression). If "other" the progression is assumed to be arbitrary without an obvious directionality. Default is "circular". The second element can be either "clockwise" or "anticlockwise" depending on how the path is expected to proceed. Default is "clockwise". If the first element is "other" the second element can be ommited. If path.type = "other", the function does not estimate a path. The cross-validation algorithm will probably fail for this kind of path.type values because it will not be able to automatically guess the progression path. It is suggested that the user runs the cross-validation manually (each time specifying the path in Fluo_modeling()), collect the data in a list similar to the one produced here and input them into Fluo_CV_modeling() to get the results.
`BGmethod`	Character string. The type of image background correction to be performed. One of "normexp" or "subtract". Default is "normexp".
`maxMix`	Integer. The maximum number of components to fit into the mixture of regressions model. If maxMix=1 or if the the optimal number of the estimated components is 1, the model reduces to the classical 2-way ANOVA. Default is 3.
`single.batch.analysis`	Numeric. The baseline run(s) to perform run effect correction with flexmix. Due to iterative nature of this function it can be a series of values includying 0 (averaging of run correction estimates). Default is 1:5.
`transformation`	Character string. One of bc (Box-Cox), log, log10, asinh transforms applied to the data. Default is "log".
`prior.pi`	Float. The prior probability to accept a component. Default is 0.1.
`flex.reps`	Integer. The iterations of the Expectation-Maximization algorithm to estimate the flexmix model. Default is 50.
`flexmethod`	Character string. A method to estimate the optimal number of flexmix components. One of "BIC", "AIC", "ICL". Default is "BIC".
`areacut`	Integer. The "artificial" area size (BFarea^2) of the cells estimated by BF image modelling. Default is 0, implying that the area sizes to be corrected will by estimated automatically from the data (not recommended if prior knowledge exists).
`fixClusters`	Integer. A number that defines the number of k-mean clusters to be initially generated. If 0, the function runs GAP analysis to estimate the optimal number of clusters. Default is 0.
`altFUN`	Character string. A user-defined method to generate the initial clusters. It can be one of kmeans, samSpec, fmeans,fmerge or fpeaks. Default is "kmeans".
`k.max`	Integer. This is the maximum number of clusters that can be generated by k-means (if fixClusters = 0). Default is 15.
`VSmethod`	Character string. The variance stabilization transformation method to be applied to the corrected fluorescence data prior to the change point analysis. IT can be one of "log" or "DDHFmv". Default is "DDHFmv".
`CPmethod`	Character string. The change point method to be used. It can be one of "ECP", (non-parametric) "manualECP" (non-parametric with user-defined numner of change-points) or "PELT" (Pruned Exact Linear Time; parametric). Default is ECP.
`CPgroups`	Integer. The number of change-points to be kept if CPmethod = "manualECP". Default is 5.
`B.kmeans`	Integer. The number of bootstrap samples for the calculation of the GAP statistic. Default is 50.
`CPpvalue`	Float. The significance level below which we do not reject a change point. Default is 0.05.
`CPmingroup`	Integer. The minimum number of values for a cluster re-estimated by the change-point analysis. Default is 10.
`savePlot`	Character string. Directory to store the plots of the analysis of the whole data. Its value can be an existing directory or "screen" that prints the plot only on the screen. The "OFF" option is permanently used in cross-validations). Default is the current working directory, getwd().
`seed`	Integer. An optional seed number for the Random Number Generator. Note that this seed is a 'reference' value of the actual seed used in sampling. CONFESS is using various random sampling methods. Each method's actual seed is factor*seed. The factors vary across methods. Default is NULL.

The function can also be used to generate all pseudotime/clustering results up to the function of Fluo_modeling() but the starting cluster has to be defined in general terms (see init.path parameter below). For this reason, its parameters are essentially the same to the ones defined previously at the Fluo_adjustment() - Fluo_modeling() functions.

The results of Fluo_modeling() for difference reference runs (batches) are stored in different slots. An additional slot @init.path exists that stores the init.path parameter (its value to be used in the CV automatically).

One can directly use the run components in Fluo_ordering() to finalize the data analysis. The main purpose of this function, though, is to prepare the data for cross-validation.

step1 <- createFluo(from.file=system.file("extdata", "Results_of_image_analysis.txt",
package = "CONFESS"),separator="_")
steps2_4 <- Fluo_CV_prep(data=step1,init.path = "bottom/left",path.type=c("circular","clockwise"),
single.batch.analysis = 5,flex.reps=5,altFUN="kmeans",VSmethod="DDHFmv",CPmethod="ECP",
B.kmeans=5,CPpvalue=0.01,savePlot="OFF")