regress: SVM regression
In Yuliaxis/kernInt: Kernel Integration of Microbiome Analysis Methods & Data

Description Usage Arguments Details Value Examples

regress() automatically trains a Support Vector Regression model, tests it and returns the Normalized Mean Squared Error.

regress(
  data,
  y,
  coeff,
  kernel,
  p = 0.2,
  C = 1,
  H = NULL,
  E = 0.01,
  domain = NULL,
  k
)

`data`	Input data: a matrix or data.frame with predictor variables/features as columns. To perform MKL: a list of m datasets. All datasets should have the same number of rows
`y`	Reponse variable (continuous)
`coeff`	ONLY IN MKL CASE: A t·m matrix of the coefficients, where m are the number of different data types and t the number of different coefficient combinations to evaluate via k-CV. If absent, the same weight is given to all data sources.
`kernel`	"lin" or rbf" to standard Linear and RBF kernels. "clin" for compositional linear and "crbf" for Aitchison-RBF kernels. "jac" for quantitative Jaccard / Ruzicka kernel. "jsk" for Jensen-Shannon Kernel. "flin" and "frbf" for functional linear and functional RBF kernels. "matrix" if a pre-computed kernel matrix is given as input. To perform MKL: Vector of m kernels to apply to each dataset.
`p`	The proportion of data reserved for the test set. Otherwise, a vector containing the indexes or the names of the rows for testing.
`C`	A cost, or a vector with the possible costs to evaluate via k-Cross-Val.
`H`	Gamma hyperparameter (only in RBF-like functions). A vector with the possible values to chose the best one via k-Cross-Val can be entered. For the MKL, a list with m entries can be entered, being' m is the number of different data types. Each element on the list must be a number or, if k-Cross-Validation is needed, a vector with the hyperparameters to evaluate for each data type.
`E`	Epsilon hyperparameter, or a vector with the possible epsilons to evaluate via k-Cross-Val.
`domain`	Only used in "frbf" or "flin".
`k`	The k for the k-Cross Validation. Minimum k = 2. If no argument is provided cross-validation is not performed.

If the input data has repeated rownames, classify() will consider that the row names that share id are repeated measures from the same individual. The function will ensure that all repeated measures are used either to train or to test the model, but not for both, thus preserving the independence between the training and tets sets.

NMSE (normalized mean squared error), chosen hyperparameters, test set predicted and observed values, and variable importances (only with linear-like kernels)

# Simple regression without tuning the hyperparameters
regress(data=soil$abun,soil$metadata$ph,kernel="clin")
# The percentage of data for training can be changed (default: 0.8 Training / 0.2 Test):
regress(data=soil$abun,soil$metadata$ph,kernel="clin",p=0.6)
# Regression with 10-Cross-Validation to choose the best Cost and Epsilon:
regress(data=soil$abun,soil$metadata$ph,kernel="clin", C=c(0.1,1,10), E = c(0.01,0.1), k=10)
# Regression with MKL:
Nose <- list()
Nose$left <- CSSnorm(smoker$abund$nasL)
Nose$right <- CSSnorm(smoker$abund$nasR)
age <- smoker$metadata$age[seq(from=1,to=62*4,by=4)]
w <- matrix(c(0.5,0.1,0.9,0.5,0.9,0.1),nrow=3,ncol=2)
regress(data=Nose,kernel="jac",y=age,C=c(1,10,100), coeff = w, k=10)