knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.path = "man/figures/README-", out.width = "100%" )
The NIMAA package [@nimaa] provides a comprehensive set of methods for performing nominal data mining.
It employs bipartite networks to demonstrate how two nominal variables are linked, and then places them in the incidence matrix to proceed with network analysis. NIMAA aids in characterizing the pattern of missing values in a dataset, locating large submatrices with non-missing values, and predicting edges within nominal variable labels. Then, given a submatrix, two unipartite networks are constructed using various network projection methods. NIMAA provides a variety of choices for clustering projected networks and selecting the best one. The best clustering results can also be used as a benchmark for imputation analysis in weighted bipartite networks.
You can install the released version of NIMAA from CRAN with:
install.packages("NIMAA")
And the development version from GitHub with:
# install.packages("devtools") devtools::install_github("jafarilab/NIMAA")
library(NIMAA) ## load the beatAML data beatAML_data <- NIMAA::beatAML # plot the original data beatAML_incidence_matrix <- plotIncMatrix( x = beatAML_data, # original data with 3 columns index_nominal = c(2,1), # the first two columns are nominal data index_numeric = 3, # the third column is numeric data print_skim = FALSE, # if you want to check the skim output, set this as TRUE plot_weight = TRUE, # when plotting the weighted incidence matrix verbose = FALSE # NOT save the figures to local folder )
knitr::include_graphics("vignettes/patient_id-inhibitor.png")
plotBipartite(inc_mat = beatAML_incidence_matrix, vertex.label.display = T)
The extractSubMatrix()
function extracts the submatrices that have non-missing values or have a certain percentage of missing values inside (not for elements-max matrix), depending on the argument's input. The package vignette and help manual contain more details.
sub_matrices <- extractSubMatrix( x = beatAML_incidence_matrix, shape = c("Square", "Rectangular_element_max"), # the selected shapes of submatrices row.vars = "patient_id", col.vars = "inhibitor", plot_weight = TRUE, print_skim = FALSE )
knitr::include_graphics("vignettes/Row_wise_arrangement.png")
knitr::include_graphics("vignettes/Column_wise_arrangement.png")
The findCluster()
function implements seven widely used network clustering algorithms, with the option of preprocessing the input incidence matrix following the projecting of the bipartite network into unipartite networks. Also, internal and external measurements can be used to compare clustering algorithms. Details can be found in the package vignette and help manual.
cls <- findCluster( sub_matrices$Rectangular_element_max, part = 1, method = "all", # all available clustering methods normalization = TRUE, # normalize the input matrix rm_weak_edges = TRUE, # remove the weak edges in the network rm_method = 'delete', # delete the weak edges instead of lowering their weights to 0. threshold = 'median', # Use median of edges' weights as threshold set_remaining_to_1 = TRUE, # set the weights of remaining edges to 1 )
The predictEdge()
function predicts new edges between nominal variables' labels or imputes missing values in the input data matrix using several imputation methods. We can compare the imputation results using the validateEdgePrediction()
function to choose the best method based on a predefined benchmark. The package vignette and help manual contain more details.
imputations <- predictEdge( inc_mat = beatAML_incidence_matrix, method = c('svd','median','als','CA') )
validateEdgePrediction(imputation = imputations, refer_community = cls$fast_greedy, clustering_args = cls$clustering_args)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.