View source: R/cssr_old.R View source: R/cssr.R
formCssDesign | R Documentation |
Create design matrix of cluster representatives from matrix of raw features
formCssDesign( css_results, weighting = "weighted_avg", cutoff = 0, min_num_clusts = 0, max_num_clusts = NA, newx = NA )
css_results |
An object of class "cssr" (the output of the function css). |
weighting |
Character; determines how to calculate the weights to combine features from the selected clusters into weighted averages, called cluster representatives. Must be one of "sparse", "weighted_avg", or "simple_avg'. For "sparse", all the weight is put on the most frequently selected individual cluster member (or divided equally among all the clusters that are tied for the top selection proportion if there is a tie). For "weighted_avg", the weight used for each cluster member is calculated in proportion to the individual selection proportions of each feature. For "simple_avg", each cluster member gets equal weight regardless of the individual feature selection proportions (that is, the cluster representative is just a simple average of all the cluster members). See Faletto and Bien (2022) for details. Default is "weighted_avg". |
cutoff |
Numeric; css will return only those clusters with selection proportions equal to at least cutoff. Must be between 0 and 1. Default is 0 (in which case all clusters are returned in decreasing order of selection proportion). |
min_num_clusts |
Integer or numeric; the minimum number of clusters to use regardless of cutoff. (That is, if the chosen cutoff returns fewer than min_num_clusts clusters, the cutoff will be increased until at least min_num_clusts clusters are selected.) Default is 0. |
max_num_clusts |
Integer or numeric; the maximum number of clusters to use regardless of cutoff. (That is, if the chosen cutoff returns more than max_num_clusts clusters, the cutoff will be decreased until at most max_num_clusts clusters are selected.) Default is NA (in which case max_num_clusts is ignored). |
newx |
A numeric matrix (preferably) or a data.frame (which will be coerced internally to a matrix by the function model.matrix) containing the data that will be used to generate the design matrix of cluster representatives. Must contain the same features (in the same number of columns) as the X matrix provided to css, and if the columns of newx are labeled, the names must match the variable names provided to css. newx may be omitted if train_inds were provided to css to set aside observations for model estimation. If this is the case, then when newx is omitted formCssDesign will return a design matrix of cluster representatives formed from the train_inds observations from the matrix X provided to css. (If no train_inds were provided to css, newX must be provided to formCssDesign.) Default is NA. |
A design matrix with the same number of rows as newx (or the train_inds provided to css) where the columns are the constructed cluster representatives.
Gregory Faletto, Jacob Bien
Faletto, G., & Bien, J. (2022). Cluster Stability Selection. arXiv preprint arXiv:2201.00494. https://arxiv.org/abs/2201.00494.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.