kernInt: Kernel Integration of Microbiome Analysis Methods & Data

Description Usage Arguments Details Value Examples

outliers() has two principal usages: unsupervised detection of outliers in data, o supervised one-class SVC.

outliers(
  data,
  y = NULL,
  kernel,
  coeff,
  nu = 0.2,
  p = 0.2,
  k,
  domain = NULL,
  H = NULL
)

`data`	Input data: a matrix or data.frame with predictor variables/features as columns. To perform MKL: a list of m datasets. All datasets should have the same number of rows
`y`	Reponse variable (continuous)
`kernel`	"lin" or rbf" to standard Linear and RBF kernels. "clin" for compositional linear and "crbf" for Aitchison-RBF kernels. "jac" for quantitative Jaccard / Ruzicka kernel. "jsk" for Jensen-Shannon Kernel. "flin" and "frbf" for functional linear and functional RBF kernels. "matrix" if a pre-computed kernel matrix is given as input. To perform MKL: Vector of m kernels to apply to each dataset.
`coeff`	ONLY IN MKL CASE: A t·m matrix of the coefficients, where m are the number of different data types and t the number of different coefficient combinations to evaluate via k-CV. If absent, the same weight is given to all data sources.
`nu`	Hyperparameter nu
`p`	The proportion of data reserved for the test set. Otherwise, a vector containing the indexes or the names of the rows for testing.
`k`	The k for the k-Cross Validation. Minimum k = 2. If no argument is provided cross-validation is not performed.
`domain`	Only used in "frbf" or "flin".
`H`	Gamma hyperparameter (only in RBF-like functions). A vector with the possible values to chose the best one via k-Cross-Val can be entered. For the MKL, a list with m entries can be entered, being' m is the number of different data types. Each element on the list must be a number or, if k-Cross-Validation is needed, a vector with the hyperparameters to evaluate for each data type.

If outliers() is used in a supervised way and the input data has repeated rownames, classify() will consider that the row names that share id are repeated measures from the same individual. The function will ensure that all repeated measures are used either to train or to test the model, but not for both, thus preserving the independence between the training and tets sets.

The indexes of the outliers (outlier detection) or, if a value is provided for y, the confusion matrix (one-class SVM)

# Outlier detection
outliers(data=soil$abund,kernel="clin",nu=0.2)
## One-class SVM:
outliers(data=soil$abund ,y=soil$metadata[ ,"env_feature"],kernel="clin")
## One-class SVM with 10-Cross-Validation:
outliers(data=soil$abund ,y=soil$metadata[ ,"env_feature"],kernel="clin",nu=c(0.45,0.5),k=10)
## With data of multiple sources
outliers(data=smoker$abund,kernel="crbf",H=list(nasL=0.01,nasR=0.01,oroL=0.1,oroR=0.1))