The package contains an implementation of the Fermi-Dirac distribution for ensemble learning described in:
Kim, S. et. al. "Learning from Fermions: the Fermi-Dirac Distribution Provides a Calibrated Probabilistic Output for Binary Classifiers" in review (2021)
The package was developed and applied using:
as well as any packages that numpy
, scipy
, and matplotlib
require.
In this example I show how to use the fd
package to simulate data, apply the
available classifiers, and compute their performance as measured by AUC.
First, import relevant packages
>>> import numpy as np
>>> from fd import classifiers as cls
>>> from fd import stats, sample
Then we need to set simulation parameters. These include:
M
for the number of base classifiers,N
for the total number of samplesprevalence
the fraction of samples belonging to the positive classconditional_corr
the conditional correlation coefficient between base classifier predictions.and lastly we need to make an array of AUC values, in which the ith element corresponds to the performance of the ith base classifier.
>>> M, N, prevalence, conditional_corr = 10, 1000, 0.3, 0
>>> auc = np.linspace(0.55, 0.85, M)
>>> R, y = sample.data_set(auc, conditional_corr, prevalence, N)
With these synthetic data, we can fit the FD model and find the best individual base classifier.
>>> fcl = cls.FDensemble()
>>> fcl.fit(R, y)
# the wisdom-of-crowd (Woc) classifier does not requires fitting.
>>> wcl = cls.Woc()
>>> bcl = cls.BestInd()
>>> bcl.fit(R, y)
Lastly, we compute the AUC of each classifier,
>>> classifiers = {"FDensemble": fcl, "Woc": wcl, "Best Ind": bcl}
>>> test_R, test_y = sample.data_set(auc, conditional_corr, prevalence, N)
>>> for key, cl in classifiers.items():
>>> cl_auc = stats.rank_2_auc(cl.compute_ranks(test_R), test_y)
>>> print("AUC of the {}: {}".format(key, cl_auc))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.