mmpc: Max-Min Parents and Children variable selection algorithm for...
In Rfast2: A Collection of Efficient and Extremely Fast R Functions II

View source: R/variable_selection_algortihms.R

Max-Min Parents and Children variable selection algorithm for continuous responses

R Documentation

Max-Min Parents and Children variable selection algorithm for continuous responses

Description

Max-Min Parents and Children variable selection algorithm for continuous responses.

Usage

mmpc(y, x, max_k = 3, alpha = 0.05, method = "pearson", 
ini = NULL, hash = FALSE, hashobject = NULL, backward = FALSE)

Arguments

`y`	The class variable. Provide a numeric vector.
`x`	The main dataset. Provide a numeric matrix.
`max_k`	The maximum conditioning set to use in the conditional independence test. Provide an integer. The default value set is 3.
`alpha`	Threshold for assessing p-values' significance. Provide a double value, between 0.0 and 1.0. The default value set is 0.05.
`method`	Currently only "pearson" is supported.
`ini`	This argument is used for the avoidance of the univariate associations re-calculations, in the case of them being present. Provide it in the form of a list.
`hash`	Boolean value for the activation of the statistics storage in a hash type object. The default value is false.
`hashobject`	This argument is used for the avoidance of the hash re-calculation, in the case of them being present, similarly to ini argument. Provide it in the form of a hash. Please note that the generated hash object should be used only when the same dataset is re-analyzed, possibly with different values of max_k and alpha.
`backward`	Boolean value for the activation of the backward/symmetry correction phase. This option removes and falsely included variables in the parents and children set of the target variable. It calls the `link{mmpc_bp}` for this purpose. The backward option seems dubious. Please do not use at the moment.

Details

The MMPC function implements the MMPC algorithm as presented in "Tsamardinos, Brown and Aliferis. The max-min hill-climbing Bayesian network structure learning algorithm" http://www.dsl-lab.org/supplements/mmhc_paper/paper_online.pdf

Value

The output of the algorithm is an list including:

`selected`	The order of the selected variables according to the increasing pvalues.
`hashobject`	The hash object containing the statistics calculated in the current run.
`pvalues`	For each feature included in the dataset, this vector reports the strength of its association with the target in the context of all other variables. Particularly, this vector reports the max p-values found when the association of each variable with the target is tested against different conditional sets. Lower values indicate higher association.
`stats`	The statistics corresponding to the aforementioned pvalues (higher values indicate higher association).
`univ`	This is a list with the univariate associations; the test statistics and their corresponding logged p-values.
`max_k`	The max_k value used in the current execution.
`alpha`	The alpha value used in the current execution.
`n.tests`	If hash = TRUE, the number of tests performed will be returned. If hash != TRUE, the number of univariate associations will be returned.
`runtime`	The time (in seconds) that was needed for the execution of algorithm.

Author(s)

Marios Dimitriadis.

R implementation and documentation: Marios Dimitriadis <kmdimitriadis@gmail.com>.

References

Tsagris M. and Tsamardinos I. (2019). Feature selection with the R package MXM. F1000Research 7: 1505

Feature Selection with the R Package MXM: Discovering Statistically Equivalent Feature Subsets, Lagani V. and Athineou G. and Farcomeni A. and Tsagris M. and Tsamardinos I. (2017). Journal of Statistical Software, 80(7).

Tsamardinos, I., Aliferis, C. F. and Statnikov, A. (2003). Time and sample efficient discovery of Markov blankets and direct causal relations. In Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 673-678). ACM.

Brown L. E., Tsamardinos, I. and Aliferis C. F. (2004). A novel algorithm for scalable and accurate Bayesian network learning. Medinfo, 711-715.

Tsamardinos, Brown and Aliferis (2006). The max-min hill-climbing Bayesian network structure learning algorithm. Machine learning, 65(1), 31-78.

Examples

set.seed(123)
# Dataset with continuous data
ds <- matrix(runif(100 * 30, 1, 100), ncol = 30)
# Class variable
tar <- 3 * ds[, 10] + 2 * ds[, 30] + 3 * ds[, 20] + rnorm(100, 0, 5)
mmpc(tar, ds, max_k = 3, alpha = 0.05, method = "pearson")

Rfast2 documentation built on June 8, 2025, 11:46 a.m.