Description Usage Arguments Details Value Note Author(s) References See Also Examples
Selects a scatterplot matrix from a data frame including the k
variables with approximately highest "relevance". If no own measure of relevance is defined, the function uses the maximum of the measures "Outlying", "Clumpy", "Sparse", "Striated", "1-Convex"
and "Stringy"
from the scagnostics package. See Details and Note.
1 | selectscat(data,relmat=NULL,k=5,r=k,plot=TRUE,criteria="maxm")
|
data |
A data frame or a list of class "sdfdata". If |
relmat |
|
k |
A positive integer. The number of variables to include in the scatterplot matrix. |
r |
A positive integer (greater or equal to |
plot |
Logical. Should the plot be drawn? Default is |
criteria |
|
To make this selection work fast in case of data sets with a huge number of variables, considering all possible combinations needs to be avoided. The implemented algorithm reorders the variables on optimal leafs. That means an average linkage clustering is done based on the criterion of relevance which is interpreted as a similarity measure. The new order of the variables is chosen so that pairs of variables with high values in the criteria are grouped. That allows us to search around the diagonal of the reordered matrix including all variables for the optimal matrix of size k
. The size of the area around the diagonal in which the optimal matrix is searched is controlled by r
. If r
= p
(number of numeric variables of the data set) than every possible combination is considered. Otherwise it is not certain that the optimal matrix is found.
A ggpair object (if plot=TRUE
) or a character vector including the variable names selected by the function (if plot=FALSE
).
When using more than one measure, results can be strongly influenced by differences in the scales of the measures. Make sure that all measures have similar scales.
When using the function defaults, results can strongly depend on the measure "1-Convex
".
Katrin Grimm
B. Schloerke et al. (2016) GGally: Extension to ggplot2. https://cran.r-project.org/package=GGally
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 | data(Election2005)
## Not run:
# Use whole data set with default settings
selectscat(Election2005)
# 7 variables and a higher chance of finding optimal matrix
selectscat(Election2005,k=7,r=15)
# Use correlation as the measure of relevance
selectscat(Election2005,criteria="cor")
# boring for the election data
# same result as
election_num <- Election2005[,sapply(Election2005,is.numeric)]
selectscat(election_num,relmat=cor(election_num),plot=FALSE)
# If a list of class "sdfdata" is already calculated
sdfdf <- sdf(Election2005)
# Use only measure "Outlying"
sdfdf_O <- sdfdf
sdfdf_O$sdf <- sdfdf_O$sdf[,c(1,10,11)]
selectscat(sdfdf_O,k=7,r=15)
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.