Database of leave-one-out cross validation errors for various combinations of data characteristics

Share:

Description

This is a 7-dimensional array (database) of leave-one-out cross validation errors for Random Forest, Support Vector Machines, Linear Discriminant Analysis and k-Nearest Neighbour classifiers. The database is the basis for estimating the optimal number of biomarkers at a given error tolerance level using optimiseBiomarker function. See Details for more information.

Usage

1

Format

7-dimensional numeric array.

Details

The following table gives the dimension names, lengths and values/levels of the data object errorDbase.

Dimension name Length Values/Levels
No. of biomarkers 14 (1-6, 7, 9, 11, 15, 20, 30, 40, 50, 100)
Size of replication 5 (1, 3, 5, 7, 10)
Biological variation (σ_b) 4 (0.5, 1.0, 1.5, 2.5)
Experimental variation (σ_e) 4 (0.1, 0.5, 1.0, 1.5)
Minimum (Average) fold change 4 (1 (1.73), 2(2.88), 3(4.03), 5(6.33))
Training set size 5 (10, 20, 50, 100, 250)
Classification method 3 (Random Forest, Support Vector Machine, k-Nearest Neighbour)

We have a plan to expand the database to a 8-dimensional one by adding another dimension to store error rates at different level of correlation between biomarkers. Length of each dimension will also be increased leading to a bigger database with a wider coverage of the parameter space. Current version of the database contain error rates for independent (correlation = 0) biomarkers only. Also, it does not contain error rates for Linear Discriminant Analysis, which we plan to implement in the next release of the package. With the current version of the database, optimal number of biomarkers can be estimated using the optimiseBiomarker function for any intermediate values of the factors represented by the dimensions of the database.

Author(s)

Mizanur Khondoker, Till Bachmann, Peter Ghazal
Maintainer: Mizanur Khondoker mizanur.khondoker@gmail.com.

References

Khondoker, M. R., Till T. Bachmann, T. T., Mewissen, M., Dickinson, P. et al.(2010). Multi-factorial analysis of class prediction error: estimating optimal number of biomarkers for various classification rules. Journal of Bioinformatics and Computational Biology, 8, 945-965.

See Also

optimiseBiomarker