knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
For a better understanding of MAKL
library, we build a simple example in this document. We first create a synthetic dataset that consists of 1000 rows and 6 features, using standard Gaussian distribution.
library(MAKL) set.seed(64327) #midas df <- matrix(rnorm(6000, 0, 1), nrow = 1000) colnames(df) <- c("F1", "F2", "F3", "F4", "F5", "F6")
As to membership
argument of makl_train()
, we prepare a list consisting of two groups such that the first one contains the features F1, F5 and F6; the second one contains the rest. Note that the column names of the input dataset should be a superset of the union of all feature names in the groups
list.
# check colnames(df) for them to be matching with group members groups <- list() groups[[1]] <- c("F1", "F5", "F6") groups[[2]] <- c("F2", "F3", "F4")
We then create the response vector y
such that it will be dependent on the second, the third and the fourth features, namely F2, F3 and F4: If, for a data instance, the sum of entries in the second, the third and the fourth columns is positive, the corresponding response is assigned +1, else, it is assigned -1.
y <- c() for(i in 1:nrow(df)) { if((df[i, 2] + df[i, 3] + df[i, 4]) > 0) { y[i] <- +1 } else { y[i] <- -1 } }
We use the synthetic dataset df
and response vector y
as our train dataset and train response vector in makl_train()
, we choose the number of random features D
equal to 2 which makes sense knowing that our train dataset is 6 dimensional. We choose the number of rows to be used for distance matrix calculation, sigma_N
equal to 1000, and lambda_set
consisting of 0.9, 0.8, 0.7, 0.6 for sparse solutions. As membership list, we use the groups
list that we created above.
makl_model <- makl_train(X = df, y = y, D = 2, sigma_N = 1000, CV = 1, membership = groups, lambda_set = c(0.9, 0.8, 0.7, 0.6))
When we check the coefficients of our model, we see that the chosen kernel for prediction by makl_train()
was the kernel of the second group. This was an expected result since we created the response vector y
to be dependent on the second group members of the groups
list.
makl_model$model$coefficients
Now, let us create a synthetic dataset df_test
and a synthetic test response vector y_test
to use in makl_test()
to check the results.
df_test <- matrix(rnorm(600, 0, 1), nrow = 100) colnames(df_test) <- c("F1", "F2", "F3", "F4", "F5", "F6") y_test <- c() for(i in 1:nrow(df_test)) { if((df_test[i, 2] + df_test[i, 3] + df_test[i, 4]) > 0) { y_test[i] <- +1 } else { y_test[i] <- -1 } } result <-makl_test(X = df_test, y = y_test, makl_model = makl_model)
The list result
contains two elements:
1) The predictions for the test response vector y_test
and
2) The area under the ROC curve (AUROC) versus the number of selected kernels values for each element in the lambda_set
if CV
is not applied; the area under the ROC curve versus the number of selected kernels value for the best lambda
in the lambda_set
if CV
is applied.
result$auroc_kernel_number
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.