bpr_diff_predict_wrap: Predict differential gene expression from differential...
In andreaskapou/BPRMeth-devel: Model higher-order methylation profiles

Description Usage Arguments Value Author(s) See Also

bpr_diff_predict_wrap is a function that wraps all the necessary subroutines for performing prediction of differential gene expression levels. Initially, it optimizes the parameters of the basis functions so as to learn the methylation profiles for the control and the treatment samples Then, the two learned methylation profiles are concatenated to keep all coefficients for both profiles. Then the learned parameters / coefficients of the basis functions are given as input features for performing regression in order to predict the corresponding differential (log2 fold-change) gene expression levels.

bpr_diff_predict_wrap(formula = NULL, x, y, model_name = "svm", w = NULL,
  basis = NULL, train_ind = NULL, train_perc = 0.7,
  fit_feature = "RMSE", cpg_dens_feat = TRUE, opt_method = "CG",
  opt_itnmax = 100, is_parallel = TRUE, no_cores = NULL,
  is_summary = TRUE)

`formula`	An object of class `formula`, e.g. see `lm` function. If NULL, the simple linear regression model is used.
`x`	The binomial distributed observations. A list containing two lists for control and treatment samples. Each list has elements of length N, where each element is an L x 3 matrix of observations, where 1st column contains the locations. The 2nd and 3rd columns contain the total reads and number of successes at the corresponding locations, repsectively. See `process_haib_caltech_wrap` on a possible way to get this data structure.
`y`	Corresponding gene expression data. A list containing two vectors for control and treatment samples.
`model_name`	A string denoting the regression model. Currently, available models are: `"svm"`, `"randomForest"`, `"rlm"`, `"mars"` and `"lm"`.
`w`	Optional vector of initial parameter / coefficient values.
`basis`	Optional basis function object, default is an 'rbf' object, see `create_rbf_object`.
`train_ind`	Optional vector containing the indices for the train set.
`train_perc`	Optional parameter for defining the percentage of the dataset to be used for training set, the remaining will be the test set.
`fit_feature`	Return additional feature on how well the profile fits the methylation data. Either NULL for ignoring this feature or one of the following: 1) "RMSE" for returning the fit of the profile using the RMSE as measure of error or 2) "NLL" for returning the fit of the profile using the Negative Log Likelihood as measure of error.
`cpg_dens_feat`	Logical, whether to return an additional feature for the CpG density across the promoter region.
`opt_method`	The optimization method to be used. See `optim` for possible methods. Default is "CG".
`opt_itnmax`	Optional argument giving the maximum number of iterations for the corresponding method. See `optim` for details.
`is_parallel`	Logical, indicating if code should be run in parallel.
`no_cores`	Number of cores to be used, default is max_no_cores - 2.
`is_summary`	Logical, print the summary statistics.

A 'bpr_diff_predict' object which, in addition to the input parameters, consists of the following variables:

W_opt: An Nx(2M+2) matrix with the optimized parameter values. Each row of the matrix corresponds to the concatenated coefficients of the methylation profiles from both samples. The columns are of the same length as the concatenated parameter vector [w_contr, w_treat] (i.e. number of basis functions).
Mus: A list containing two matrices of size N x M with the RBF centers for each sample, if basis object is create_rbf_object, otherwise NULL.
train: The training data.
test: The test data.
gex_model: The fitted regression model.
train_pred The predicted values for the training data.
test_pred The predicted values for the test data.
train_errors: The training error metrics.
test_errors: The test error metrics.