fitpred: Functions for fitting models with MCMC, predicting class...
In BCBCSF: Bias-Corrected Bayesian Classification with Selected Features

bcbcsf_fitpred trains models with Gibbs sampling for each number of retained features. The results are saved in files. This function also makes predictions for test cases if they are provided.

bcbcsf_pred uses the posterior samples saved by bcbcsf_fitpred to predict the class labels of test cases. Prediction results are an array of predictive probabilities array_probs_pred, whose rows for test cases, columns for classes, and the 3rd dimension for different numbers of retained features.

cross_vld uses cross-validation to obtain predictive probabilities for all cases of a data set. This generic function can be used with bcbcsf_fitpred and other classifiers.

bcbcsf_fitpred (
  ## arguments specifying info of data sets
  X_tr, y_tr, nos_fsel = ncol (X_tr), 
  X_ts = NULL,  standardize = FALSE, rankf = FALSE,
  ## arguments for prediction
  burn = NULL, thin = 1, offset_sdxj = 0.5,
  ## arguments for Markov chain sampling
  no_rmc = 1000, no_imc = 5, no_mhwmux = 10,
  fit_bcbcsf_filepre = ".fitbcbcsf_", 
  ## arguments specifying priors for parameters and hyerparameters
  w0_mu = 0.05, alpha0_mu = 0.5, alpha1_mu = 3,
  w0_x  = 1.00, alpha0_x  = 0.5, alpha1_x  = 10,
  w0_nu = 0.05, alpha0_nu = 0.5, prior_psi = NULL,
  ## arguments for metropolis sampling for wmu, wx
  stepadj_mhwmux = 1, diag_mhwmux = FALSE,
  ## arguments for computing adjustment factor
  bcor = 1, cut_qf = exp (-10), cut_dpoi = exp (-10), nos_sim = 1000,
  ## whether look at progress
  monitor = TRUE)
  
bcbcsf_pred (X_ts, out_fit, burn = NULL, thin = 1, offset_sdxj = 0.5)

cross_vld (X, y, nfold = 10, folds = NULL, 
           fitpred_func = bcbcsf_fitpred,  ...)

`X_tr, X_ts, X`	matrices containing gene expression data; rows should be for the cases, and columns for different genes; `X_tr` are training data, `X_ts` are test data or future data for which prediction are needed, `X` are a data set used for cross-validation.
`y_tr,y`	class labels in training or test data set, or just a data set.
`nos_fsel`	a vector of numbers of features to be retained.
`burn,thin`	`burn` of Markov chain (super)iterations will be discarded for prediction, and only every `thin`th are used; by default, 20% of (super)iterations are burned, and `thin`=1.
`offset_sdxj`	a value between 0 and 1; 100`offset_sdxj`% quantile of the samples of all standard deviations √{w^x_j}* is added to the all standard deviations; this is to remedy the non-normality in real gene expression data sets, and especially offset some very small standard deviations; by default, median is used.
`no_rmc, no_imc`	`no_rmc` of super Markov chain transitions are run, with `no_imc` Markov chain iterations for each; only the last state of each super transition is saved.
`fit_bcbcsf_filepre`	a string added to the names of files saving Markov chain fitting results; the actual file names contain also the data dimension and number of retained features; when `fit_bcbcsf_filepre` is set to NULL, no fitting file will be created, and `bcbcsf_fitpred` returns only the fitting result corresponding to the last number of retained features in `nos_fsel`, which is always returned regardless of the value of `fit_bcbcsf_filepre`.
`w0_mu,alpha0_mu,alpha1_mu,w0_x,alpha0_x,alpha1_x,w0_nu,alpha0_nu`	settings of priors for means and variances of genes; they are denoted by w_0^{μ}, α_1^{μ}, α_1^μ,w_0^x,α_0^x,α_1^x,w_0^ν,α_0^ν in the reference.
`prior_psi`	a vector of length the number of classes, specifying the Dirichlet prior distribution for probabilities of classes; it is denoted by c_{1:G} in the reference; by default, they are all equal to 1.
`no_mhwmux,stepadj_mhwmux, diag_mhwmux`	arguments specifying Metropolis sampling for \log(w^μ) and \log(w^x); respectively the number of iterations, stepsize adjustment, and an indicator representing whether one wants to pause and look into this sampling.
`bcor`	taking value 0 or 1, indicating whether bias-correction is to be applied.
`cut_qf, cut_dpoi,nos_sim`	arguments specifying approximation of adjustment factor; `cut_qf` is f_\ell in the reference, `cut_dpoi` is the threshold below which Poisson probabilities are omitted, `nos_sim` is the number of random Λ.
`nfold, folds`	`folds` should be a list of test cases for different folds; if `folds` is NULL (by default), `folds` will be generated by the software, with `nfold` is set to the smaller value of the given value and the smallest number of cases in all classes.
`out_fit`	a list returned by `bcbcsf_fitpred`, which are used to make prediction for test cases.
`standardize`	if it is set to TRUE, the original gene expression values are centralized and divided by the pooled standard deviation; by default, it is FALSE.
`rankf`	if it is set to TRUE, the original features will be re-ordered by F-statistic; by default, it is FALSE.
`monitor`	if it is set to TRUE, progress of fitting is shown on screen
`fitpred_func`	an R function that can fit with training data, and predict for test data; the arguments of `fitpred_func` must include `X_tr`, `y_tr`, `X_ts`, and the outputs of `fitpred_func` must include `array_probs_pred`
`...`	arguments passed to classifier `fitpred_func`

`nos_fsel`	a vector of numbers of features retained.
`fitfiles`	a string vector of length `nos_fsel`, each saving file name of Markov chain fitting result for a number of retained features in `nos_fsel`; the `fitfiles` returned by `cross_vld` is for the training in the last fold.
`array_probs_pred`	an array of predictive probabilities, whose rows for test cases, columns for classes, and the 3rd dimension for different numbers of retained features.
`fit_bcbcsf`	a list of Markov chain sampling results from the fitting with number of retained features equal to the last number in `nos_fsel`. Note that, the fitting results for other numbers (including the last one) of retained feature are saved in harddrive files if `fit_bcbcsf_filepre` isn't empty, and can be retrieved using function `reload_fit_bcbcsf`. Particularly, the list component of `fit_bcbcsf` has `fsel` saving the indice of features selected by F-statistic.