fhDoRCS: Calculate the Residual Chi-Square for a 'FlowHist' object
In flowPloidy: Analyze flow cytometer data to determine sample ploidy

Description Usage Arguments Value Overview Guidelines Author(s) References See Also

Calculate the Residual Chi-Square value for a FlowHist model fit.

1	fhDoRCS(fh)

`fh`	a `FlowHist` object

The updated FlowHist object.

The algorithm used to fit the non-linear regression model works by adjusting the model parameters to minimize the Chi-Square value for the resulting fit. The Chi-Square value calculates the departure of observed values from the values predicted by the fitted model:

Chi^2 = Sum [(observed(x) - predicted(x))^2 / observed(x)]

This would make the Chi-Square value a natural choice for an index to determine the overall goodness-of-fit of the model. However, the Chi-Square value is sensitive to the number of data points in our histogram. We could aggregate the same data into 256, 512 or 1024 bins. All else being equal, the analysis based on 256 bins would give us a lower Chi-Square value than the analyses that use more bins, despite providing essentially identical results.

Bagwell (1993) suggested using the Reduced Chi-Square (RCS) value as an superior alternative. It is defined as:

RCS = Chi^2/(n - m)

Where n is the number of data points (bins), and m is the number of model parameters. Thus, we correct for the inflation of the Chi-Square value that obtains for higher numbers of bins. At the same time, we introduce a penalty for increasing model complexity, increasing the Chi-Square value proportional to the number of model parameters. This helps us protect against over-fitting the model.

As a rule of thumb, RCS values below 0.7 suggest over-fitting, and above 4.0 suggest a poor fit.

These are guidelines only, and should not be treated as significance tests. From a statistical perspective, there is limited value to a 'goodness-of-fit' index for a single model. In other contexts we'd compare several competing models to determine which is better. For this application, the RCS is serving as a rough sanity check.

Additionally, the absolute value of the RCS is influenced by particular design decisions I made in writing the model-fitting routines. Consequently, other, equally valid approaches may yield slightly different values (Rabinovitch 1994).

With this in mind, as long as the values are close to the ideal range 0.7-4.0, we can be reasonably confident that our anlaysis is acceptable. If we get values outside this range, it is a caution that we ought to carefully inspect our model fit, to make sure it appears sensible; the results may still be fine.

The most common issue identified by extreme RCS values is poor fitting of the debris component. Occassionally, an otherwise sensible looking model fit will produce extremely high RCS values. Switching from Single-Cut to Multiple-Cut, or vice versa, will often provide a much better fit, with a corresondingly lower RCS value. Visually, the fit may not look much different, and usually the model parameters don't change much either way.

Tyler Smith

Bagwell, C.B., 1993. Theoretical aspects of flow cytometry data analysis. pp.41-61 in Clinical flow cytometry: principles and applications. Baltimore: Williams & Wilkins.

Rabinovitch, P. S. 1994. DNA content histogram and cell-cycle analysis. Methods in Cell Biology 41:263-296.

fhDoCV, fhDoNLS, fhDoCounts, DebrisModels

flowPloidy documentation built on Nov. 8, 2020, 8:04 p.m.