View source: R/variable_selection_algortihms.R
Max-Min Parents and Children variable selection algorithm for continuous responses | R Documentation |
Max-Min Parents and Children variable selection algorithm for continuous responses.
mmpc(y, x, max_k = 3, alpha = 0.05, method = "pearson",
ini = NULL, hash = FALSE, hashobject = NULL, backward = FALSE)
y |
The class variable. Provide a numeric vector. |
x |
The main dataset. Provide a numeric matrix. |
max_k |
The maximum conditioning set to use in the conditional independence test. Provide an integer. The default value set is 3. |
alpha |
Threshold for assessing p-values' significance. Provide a double value, between 0.0 and 1.0. The default value set is 0.05. |
method |
Currently only "pearson" is supported. |
ini |
This argument is used for the avoidance of the univariate associations re-calculations, in the case of them being present. Provide it in the form of a list. |
hash |
Boolean value for the activation of the statistics storage in a hash type object. The default value is false. |
hashobject |
This argument is used for the avoidance of the hash re-calculation, in the case of them being present, similarly to ini argument. Provide it in the form of a hash. Please note that the generated hash object should be used only when the same dataset is re-analyzed, possibly with different values of max_k and alpha. |
backward |
Boolean value for the activation of the backward/symmetry correction phase. This option removes and falsely included variables in the parents and children set of the target variable. It calls the |
The MMPC function implements the MMPC algorithm as presented in "Tsamardinos, Brown and Aliferis. The max-min hill-climbing Bayesian network structure learning algorithm" http://www.dsl-lab.org/supplements/mmhc_paper/paper_online.pdf
The output of the algorithm is an list including:
selected |
The order of the selected variables according to the increasing pvalues. |
hashobject |
The hash object containing the statistics calculated in the current run. |
pvalues |
For each feature included in the dataset, this vector reports the strength of its association with the target in the context of all other variables. Particularly, this vector reports the max p-values found when the association of each variable with the target is tested against different conditional sets. Lower values indicate higher association. |
stats |
The statistics corresponding to the aforementioned pvalues (higher values indicate higher association). |
univ |
This is a list with the univariate associations; the test statistics and their corresponding logged p-values. |
max_k |
The max_k value used in the current execution. |
alpha |
The alpha value used in the current execution. |
n.tests |
If hash = TRUE, the number of tests performed will be returned. If hash != TRUE, the number of univariate associations will be returned. |
runtime |
The time (in seconds) that was needed for the execution of algorithm. |
Marios Dimitriadis.
R implementation and documentation: Marios Dimitriadis <kmdimitriadis@gmail.com>.
Tsagris M. and Tsamardinos I. (2019). Feature selection with the R package MXM. F1000Research 7: 1505
Feature Selection with the R Package MXM: Discovering Statistically Equivalent Feature Subsets, Lagani V. and Athineou G. and Farcomeni A. and Tsagris M. and Tsamardinos I. (2017). Journal of Statistical Software, 80(7).
Tsamardinos, I., Aliferis, C. F. and Statnikov, A. (2003). Time and sample efficient discovery of Markov blankets and direct causal relations. In Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 673-678). ACM.
Brown L. E., Tsamardinos, I. and Aliferis C. F. (2004). A novel algorithm for scalable and accurate Bayesian network learning. Medinfo, 711-715.
Tsamardinos, Brown and Aliferis (2006). The max-min hill-climbing Bayesian network structure learning algorithm. Machine learning, 65(1), 31-78.
mmpc
set.seed(123)
# Dataset with continuous data
ds <- matrix(runif(100 * 500, 1, 100), ncol = 500)
# Class variable
tar <- 3 * ds[, 10] + 2 * ds[, 100] + 3 * ds[, 20] + rnorm(100, 0, 5)
mmpc(tar, ds, max_k = 3, alpha = 0.05, method = "pearson")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.