Nothing
######
## VT::12.12.2018
##
##
## roxygen2::roxygenise("C:/users/valen/onedrive/myrepo/R/fsdaR", load_code=roxygen2:::load_installed)
##
#' Computes trimmed clustering with scatter restrictions
#'
#' @description Partitions the points in the n-by-v data matrix
#' \code{Y} into \code{k} clusters. This partition minimizes the trimmed sum,
#' over all clusters, of the within-cluster sums of point-to-cluster-centroid
#' distances. Rows of Y correspond to points, columns correspond to variables.
#' Returns in the output object of class \code{\link{tclustfsda.object}} an n-by-1 vector
#' \code{idx} containing the cluster indices of each point. By default,
#' \code{tclustfsda()} uses (squared), possibly constrained, Mahalanobis distances.
#'
#' @details This iterative algorithm initializes \code{k} clusters randomly and performs
#' concentration steps in order to improve the current cluster assignment. The number of
#' maximum concentration steps to be performed is given by input parameter \code{refsteps}.
#' For approximately obtaining the global optimum, the system is initialized \code{nsamp}
#' times and concentration steps are performed until convergence or \code{refsteps} is
#' reached. When processing more complex data sets higher values of \code{nsamp} and
#' \code{refsteps} have to be specified (obviously implying extra computation time).
#' However, if more then 10 per cent of the iterations do not converge, a warning message
#' is issued, indicating that \code{nsamp} has to be increased.
#' @param x An n x p data matrix (n observations and p variables).
#' Rows of x represent observations, and columns represent variables.
#'
#' Missing values (NA's) and infinite values (Inf's) are allowed,
#' since observations (rows) with missing or infinite values will
#' automatically be excluded from the computations.
#'
#' @param k Number of groups.
#'
#' @param alpha A scalar between 0 and 0.5 or an integer specifying the number of
#' observations which have to be trimmed. If \code{alpha=0}, \code{tclust} reduces to
#' traditional model based or mixture clustering (mclust): see for example the
#' Matlab function \code{gmdistribution}.
#'
#' More in detail, if \code{0 < alpha < 1} clustering is based on \code{h = floor(n * (1-alpha))}
#' observations, else if alpha is an integer greater than 1 clustering is based on \code{h = n - floor(alpha)}.
#' If \code{monitoring=TRUE}, \code{alpha} is a vector which specifies the values of
#' trimming levels which have to be considered - contains decresing elements which
#' lie in the interval 0 and 0.5.
#' For example if \code{alpha=c(0.1, 0.05, 0)}, \code{tclust()} considers these 3 values of trimming level.
#' The default for alpha is vector \code{alpha=c(0.1, 0.05, 0)}. The sequence is
#' forced to be monotonically decreasing.
#' @param restrfactor Positive scalar which constrains the allowed differences among group scatters.
#' Larger values imply larger differences of group scatters. On the other hand
#' a value of 1 specifies the strongest restriction forcing all
#' eigenvalues/determinants to be equal and so the method looks
#' for similarly scattered (respectively spherical) clusters.
#' The default is to apply \code{restrfactor} to eigenvalues. In order to
#' apply \code{restrfactor} to determinants it is necessary to use the
#' optional input argument \code{restrtype}.
#'
#' @param monitoring If \code{monitoring=TRUE} TCLUST is performed for a series of values
#' of the trimming factor \code{alpha} given \code{k} (number of groups) and given
#' \code{c} (restriction factor).
#
#' @param plot If \code{plot=FALSE} (default) or \code{plot=0} no plot is produced.
#' If \code{plot=TRUE} and \code{monitoring=FALSE} a plot with the classification
#' is shown (using the spmplot function). The plot can be:
#' \itemize{
#' \item for \code{p = 1}, a histogram of the univariate data,
#' \item for \code{p = 2}, a bivariate scatterplot,
#' \item for \code{p > 2}, a scatterplot matrix generated by the MATLAB function \code{spmplot()}.
#' }
#' When \code{p >= 2} the following additional features are offered
#' (for \code{p = 1} the behaviour is forced to be as for \code{plots=TRUE}):
#'
#' \itemize{
#' \item \code{plot = 'contourf'} adds in the background of the bivariate scatterplots a
#' filled contour plot. The colormap of the filled contour is based on grey levels as default.
#' This argument may also be inserted in a field named 'type' of a list. In the latter case
#' it is possible to specify the additional field 'cmap', which changes the default colors
#' of the color map used. The field 'cmap' may be a three-column matrix of values in the
#' range [0,1] where each row is an RGB triplet that defines one color. Check the colormap
#' function for additional informations.
#' \item \code{plot = 'contour'} adds in the background of the bivariate scatterplots a contour plot.
#' The colormap of the contour is based on grey levels as default. This argument may also be
#' inserted in a field named \code{type} of a list. In the latter case it is possible to specify
#' the additional field \code{cmap}, which changes the default colors of the color map used. The field
#' \code{cmap} may be a three-column matrix of values in the range [0,1] where each row is an RGB
#' triplet that defines one color. Check the \code{colormap()} (MATLAB) function for additional information.
#' \item \code{plot = 'ellipse'} superimposes confidence ellipses to each group in the bivariate
#' scatterplots. The size of the ellipse is \code{qchisq(0.95, 2)}, i.e. the confidence level used
#' by default is 95 percent. This argument may also be inserted in a field named \code{type} of
#' a list. In the latter case it is possible to specify the additional field \code{conflev},
#' which specifies the confidence level to use and it is a value between 0 and 1.
#' \item \code{plot = 'boxplotb'} superimposes on the bivariate scatterplots the bivariate boxplots
#' for each group, using the boxplotb function. This argument may also be inserted in a field
#' named \code{type} of a list.
#' }
#'
#' The parameter \code{plot} can be also a list and in this case its elements are:
#' \itemize{
#' \item \code{type} - specifies the type of plot as when plot option is a character.
#' Therefore, plots$type can be one of 'contourf', 'contour', 'ellipse' or 'boxplotb'.
#' \item \code{cmap} - used to set a colormap for the plot type (MATLAB style). For example, plot$cmap = 'autumn'.
#' See the MATLAB help of colormap for a list of colormap possiblilites.
#' \item \code{conflev} - this is the confidence level for the confidence ellipses.
#' It must me a scalar between 0 and 1.
#' }
#'
#' If \code{plot=TRUE} and \code{monitoring=TRUE} two plots are shown.
#' The first plot (\emph{monitor plot}) shows three panels with the monitoring between two
#' consecutive values of alpha: (i) the change in classification using ARI index (top panel),
#' (ii) the change in centroids using squared euclidean distances (central panel) and
#' (iii) the change in covariance matrices using squared euclidean distance (bottom panel).
#'
#' The second plot (\emph{gscatter plot}) shows a series of subplots which monitor the classification
#' for each value of \code{alpha}. In order to make sure that consistent labels are used for the
#' groups, between two consecutive values of \code{alpha}, we assign label \code{r} to a group if
#' this group shows the smallest distance with group \code{r} for the previous value of \code{alpha}.
#' The type of plot which is used to monitor the stability of the classification depends on the
#' data dimensionality \code{p}.
#' \itemize{
#' \item for \code{p = 1}, a histogram of the univariate data (the MATLAB function \code{histFS()} is called),
#' \item for \code{p = 2}, a bivariate scatterplot (the MATLAB function \code{gscatter()} is called),
#' \item for \code{p > 2}, a scatterplot of the first two principal components (function \code{gscatter()}
#' is called and we show on the axes titles the percentage of variance explained by the first
#' two principal components).
#' }
#'
#' Also in the case of \code{monitoring=TRUE} the parameter \code{plot} can be a list and its elements are:
#' \itemize{
#' \item name: character vector which enables to specify which plot to display.
#' \code{name = "gscatter"} produces a figure with a series of subplots which show the
#' classification for each value of \code{alpha}. \code{name = "monitor"} shows a figure
#' with three panels which monitor between two consecutive values of alpha the change in
#' classification using ARI index (top panel), the change in centroids using squared
#' euclidean distances (central panel), the change in covariance matrices using squared
#' euclidean distance (bottom panel). If this field is not specified, by default
#' \code{name=c("gscatter", "monitor")} and both figures will be shown.
#' \item alphasel: a numeric vector which specifies for which values of alpha it
#' is possible to see the classification. For example if \code{alphasel = c(0.05, 0.02)},
#' the classification will be shown just for \code{alpha=0.05} and \code{alpha=0.02}.
#' If this field is not specified \code{alphasel=alpha} and therefore the classification
#' is shown for each value of alpha.
#' }
#'
#' @param nsamp If a scalar, it contains the number of subsamples which will be extracted.
#' If \code{nsamp = 0} all subsets will be extracted. Remark - if the number of all possible
#' subset is greater than 300 the default is to extract all subsets, otherwise just 300.
#' If \code{nsamp} is a matrix it contains in the rows the indexes of the subsets which
#' have to be extracted. \code{nsamp} in this case can be conveniently generated by
#' function \code{subsets()}. \code{nsamp} can have \code{k} columns or \code{k * (p + 1)}
#' columns. If \code{nsamp} has \code{k} columns the \code{k} initial centroids each
#' iteration i are given by \code{X[nsamp[i,] ,]} and the covariance matrices are equal
#' to the identity.
#'
#' If \code{nsamp} has \code{k * (p + 1)} columns, the initial centroids and covariance
#' matrices in iteration \code{i} are computed as follows:
#' \itemize{
#' \item X1 <- X[nsamp[i ,] ,]
#' \item mean(X1[1:p + 1, ]) contains the initial centroid for group 1
#' \item cov(X1[1:p + 1, ]) contains the initial cov matrix for group 1
#' \item mean(X1[(p + 2):(2*p + 2), ]) contains the initial centroid for group 2
#' \item cov(X1[(p + 2):(2*p + 2), ]) contains the initial cov matrix for group 2
#' \item ...
#' \item mean(X1[(k-1)*p+1):(k*(p+1), ]) contains the initial centroids for group k
#' \item cov(X1[(k-1)*p+1):(k*(p+1), ]) contains the initial cov matrix for group k.
#' }
#'
#' REMARK: If \code{nsamp} is not a scalar, the option \code{startv1} given below is ignored.
#' More precisely, if \code{nsamp} has \code{k} columns \code{startv1 = 0} else if
#' \code{nsamp} has \code{k*(p+1)} columns option \code{startv1=1}.
#'
#' @param refsteps Number of refining iterations in each subsample. Default is \code{refsteps=15}.
#' \code{refsteps = 0} means "raw-subsampling" without iterations.
#'
#' @param reftol Tolerance of the refining steps. The default value is 1e-14
#'
#' @param equalweights A logical specifying wheather cluster weights in the concentration
#' and assignment steps shall be considered. If \code{equalweights=TRUE} we are (ideally)
#' assuming equally sized groups, else if \code{equalweights = false} (default) we allow for
#' different group weights. Please, check in the given references which functions
#' are maximized in both cases.
#'
#' @param mixt Specifies whether mixture modelling or crisp assignment approach to model
#' based clustering must be used. In the case of mixture modelling parameter mixt also
#' controls which is the criterion to find the untrimmed units in each step of the maximization.
#' If \code{mixt >=1} mixture modelling is assumed else crisp assignment.
#' The default value is \code{mixt=0}, i.e. crisp assignment. Please see
#' for details the provided references.
#' The parameter \code{mixt} also controls the criterion to select the units to trim.
#' If \code{mixt = 2} the \code{h} units are those which give the largest contribution
#' to the likelihood, else if \code{mixt=1} the criterion to select the \code{h} units is
#' exactly the same as the one which is used in crisp assignment.
#'
#' @param msg Controls whether to display or not messages on the screen. If \code{msg==TRUE}
#' messages are displayed on the screen. If \code{msg=2}, detailed messages are displayed,
#' for example the information at iteration level.
#'
#' @param nocheck Check input arguments. If \code{nocheck=TRUE} no check is performed
#' on matrix \code{X}. The default \code{nocheck=FALSE}.
#'
#' @param startv1 How to initialize centroids and covariance matrices. Scalar.
#' If \code{startv1=1} then initial centroids and covariance matrices are based
#' on \code{(p+1)} observations randomly chosen, else each centroid is initialized
#' taking a random row of input data matrix and covariance matrices are initialized
#' with identity matrices. The default value is\code{startv1=1}.
#'
#' Remark 1: in order to start with a routine which is in the required parameter space,
#' eigenvalue restrictions are immediately applied.
#'
#' Remark 2 - option \code{startv1} is used just if \code{nsamp} is a scalar
#' (see for more details the help associated with \code{nsamp}).
#'
#'
#' @param RandNumbForNini pre-extracted random numbers to initialize proportions.
#' Matrix of size k-by-nrow(nsamp) containing the random numbers which
#' are used to initialize the proportions of the groups. This option is effective just if
#' \code{nsamp} is a matrix which contains pre-extracted subsamples. The purpose of this
#' option is to enable to user to replicate the results in case routine \code{tclustreg*()}
#' is called using a parfor instruction (as it happens for example in routine IC, where
#' \code{tclustreg()} is called through a parfor for different values of the restriction factor).
#' The default is that \code{RandNumbForNini} is empty - then uniform random numbers are used.
#'
#' @param restrtype Type of restriction to be applied on the cluster scatter matrices.
#' Valid values are \code{'eigen'} (default), or \code{'deter'}.
#' \code{"eigen"} implies restriction on the eigenvalues while \code{"deter"}
#' implies restriction on the determinants.
#'
#' @param UnitsSameGroup List of the units which must (whenever possible) have
#' a particular label. For example \code{UnitsSameGroup=c(20, 26)}, means that
#' group which contains unit 20 is always labelled with number 1. Similarly,
#' the group which contains unit 26 is always labelled with number 2, (unless
#' it is found that unit 26 already belongs to group 1).
#' In general, group which contains unit \code{UnitsSameGroup(r)} where \code{r=2, ...length(kk)-1}
#' is labelled with number \code{r} (unless it is found that unit \code{UnitsSameGroup(r)}
#' has already been assigned to groups \code{1, 2, ..., r-1}.
#'
#' @param numpool The number of parallel sessions to open. If numpool is not defined,
#' then it is set equal to the number of physical cores in the computer.
#'
#' @param cleanpool Logical, indicating if the open pool must be closed or not.
#' It is useful to leave it open if there are subsequent parallel sessions to execute,
#' so that to save the time required to open a new pool.
#'
#' @param trace Whether to print intermediate results. Default is \code{trace=FALSE}.
#'
#' @param ... potential further arguments passed to lower level functions.
#'
#' @return Depending on the input parameter \code{monitoring}, one of
#' the following objects will be returned:
#'
#' \enumerate{
#' \item \code{\link{tclustfsda.object}}
#' \item \code{\link{tclusteda.object}}
#' }
#' @references
#' Garcia-Escudero, L.A., Gordaliza, A., Matran, C. and Mayo-Iscar, A. (2008).
#' A General Trimming Approach to Robust Cluster Analysis. Annals of Statistics,
#' Vol. 36, 1324-1345. \doi{10.1214/07-AOS515}.
#'
#' @examples
#' \dontrun{
#'
#' data(hbk, package="robustbase")
#' (out <- tclustfsda(hbk[, 1:3], k=2))
#' class(out)
#' summary(out)
#'
#' ## TCLUST of Gayser data with three groups (k=3), 10%% trimming (alpha=0.1)
#' ## and restriction factor (c=10000)
#' data(geyser2)
#' (out <- tclustfsda(geyser2, k=3, alpha=0.1, restrfactor=10000))
#'
#' ## Use the plot options to produce more complex plots ----------
#'
#' ## Plot with all default options
#' out <- tclustfsda(geyser2, k=3, alpha=0.1, restrfactor=10000, plot=TRUE)
#'
#' ## Default confidence ellipses.
#' out <- tclustfsda(geyser2, k=3, alpha=0.1, restrfactor=10000, plot="ellipse")
#'
#' ## Confidence ellipses specified by the user: confidence ellipses set to 0.5
#' plots <- list(type="ellipse", conflev=0.5)
#' out <- tclustfsda(geyser2, k=3, alpha=0.1, restrfactor=10000, plot=plots)
#'
#' ## Confidence ellipses set to 0.9
#' plots <- list(type="ellipse", conflev=0.9)
#' out <- tclustfsda(geyser2, k=3, alpha=0.1, restrfactor=10000, plot=plots)
#'
#' ## Contour plots
#' out <- tclustfsda(geyser2, k=3, alpha=0.1, restrfactor=10000, plot="contour")
#'
#' ## Filled contour plots with additional options: contourf plot with a named colormap.
#' ## Here we define four MATLAB-like colormaps, but the user can define anything else,
#' ## presented by a matrix with three columns which are the RGB triplets.
#'
#' summer <- as.matrix(data.frame(x1=seq(from=0, to=1, length=65),
#' x2=seq(from=0.5, to=1, length=65),
#' x3=rep(0.4, 65)))
#' spring <- as.matrix(data.frame(x1=rep(1, 65),
#' x2=seq(from=0, to=1, length=65),
#' x3=seq(from=1, to=0, length=65)))
#' winter <- as.matrix(data.frame(x1=rep(0, 65),
#' x2=seq(from=0, to=1, length=65),
#' x3=seq(from=1, to=0, length=65)))
#' autumn <- as.matrix(data.frame(x1=rep(1, 65),
#' x2=seq(from=0, to=1, length=65),
#' x3=rep(0, 65)))
#'
#' out <- tclustfsda(geyser2, k=3, alpha=0.1, restrfactor=10000,
#' plot=list(type="contourf", cmap=autumn))
#' out <- tclustfsda(geyser2, k=3, alpha=0.1, restrfactor=10000,
#' plot=list(type="contourf", cmap=winter))
#' out <- tclustfsda(geyser2, k=3, alpha=0.1, restrfactor=10000,
#' plot=list(type="contourf", cmap=spring))
#' out <- tclustfsda(geyser2, k=3, alpha=0.1, restrfactor=10000,
#' plot=list(type="contourf", cmap=summer))
#'
#'
#' ## We compare the output using three different values of restriction factor
#' ## nsamp is the number of subsamples which will be extracted
#' data(geyser2)
#' out <- tclustfsda(geyser2, k=3, alpha=0.1, restrfactor=10000, nsamp=500, plot="ellipse")
#' out <- tclustfsda(geyser2, k=3, alpha=0.1, restrfactor=10, nsamp=500, refsteps=10, plot="ellipse")
#' out <- tclustfsda(geyser2, k=3, alpha=0.1, restrfactor=1, nsamp=500, refsteps=10, plot="ellipse")
#'
#' ## TCLUST applied to M5 data: A bivariate data set obtained from three normal
#' ## bivariate distributions with different scales and proportions 1:2:2. One of the
#' ## components is very overlapped with another one. A 10 per cent background noise is
#' ## added uniformly distributed in a rectangle containing the three normal components
#' ## and not very overlapped with the three mixture components. A precise description
#' ## of the M5 data set can be found in Garcia-Escudero et al. (2008).
#' ##
#'
#' data(M5data)
#' plot(M5data[, 1:2])
#'
#' ## Scatter plot matrix
#' library(rrcov)
#' plot(CovClassic(M5data[,1:2]), which="pairs")
#'
#' out <- tclustfsda(M5data[,1:2], k=3, alpha=0, restrfactor=1000, nsamp=100, plot=TRUE)
#' out <- tclustfsda(M5data[,1:2], k=3, alpha=0, restrfactor=10, nsamp=100, plot=TRUE)
#' out <- tclustfsda(M5data[,1:2], k=3, alpha=0.1, restrfactor=1, nsamp=1000,
#' plot=TRUE, equalweights=TRUE)
#' out <- tclustfsda(M5data[,1:2], k=3, alpha=0.1, restrfactor=1000, nsamp=100, plot=TRUE)
#'
#' ## TCLUST with simulated data: 5 groups and 5 variables
#' ##
#' n1 <- 100
#' n2 <- 80
#' n3 <- 50
#' n4 <- 80
#' n5 <- 70
#' p <- 5
#' Y1 <- matrix(rnorm(n1 * p) + 5, ncol=p)
#' Y2 <- matrix(rnorm(n2 * p) + 3, ncol=p)
#' Y3 <- matrix(rnorm(n3 * p) - 2, ncol=p)
#' Y4 <- matrix(rnorm(n4 * p) + 2, ncol=p)
#' Y5 <- matrix(rnorm(n5 * p), ncol=p)
#'
#' group <- c(rep(1, n1), rep(2, n2), rep(3, n3), rep(4, n4), rep(5, n5))
#' Y <- Y1
#' Y <- rbind(Y, Y2)
#' Y <- rbind(Y, Y3)
#' Y <- rbind(Y, Y4)
#' Y <- rbind(Y, Y5)
#' dim(Y)
#' table(group)
#' out <- tclustfsda(Y, k=5, alpha=0.05, restrfactor=1.3, refsteps=20, plot=TRUE)
#'
#' ## Automatic choice of best number of groups for Geyser data ------------------------
#' ##
#' data(geyser2)
#' maxk <- 6
#' CLACLA <- matrix(0, nrow=maxk, ncol=2)
#' CLACLA[,1] <- 1:maxk
#' MIXCLA <- MIXMIX <- CLACLA
#'
#' for(j in 1:maxk) {
#' out <- tclustfsda(geyser2, k=j, alpha=0.1, restrfactor=5)
#' CLACLA[j, 2] <- out$CLACLA
#' }
#'
#' for(j in 1:maxk) {
#' out <- tclustfsda(geyser2, k=j, alpha=0.1, restrfactor=5, mixt=2)
#' MIXMIX[j ,2] <- out$MIXMIX
#' MIXCLA[j, 2] <- out$MIXCLA
#' }
#'
#' oldpar <- par(mfrow=c(1,3))
#' plot(CLACLA[,1:2], type="l", xlim=c(1, maxk), xlab="Number of groups", ylab="CLACLA")
#' plot(MIXMIX[,1:2], type="l", xlim=c(1, maxk), xlab="Number of groups", ylab="MIXMIX")
#' plot(MIXCLA[,1:2], type="l", xlim=c(1, maxk), xlab="Number of groups", ylab="MIXCLA")
#' par(oldpar)
#'
#'
#' ## Monitoring examples ------------------------------------------
#'
#' ## Monitoring using Geyser data
#'
#' ## Monitoring using Geyser data (all default options)
#' ## alpha and restriction factor are not specified therefore alpha=c(0.10, 0.05, 0)
#' ## is used while the restriction factor is set to c=12
#' out <- tclustfsda(geyser2, k=3, monitoring=TRUE)
#'
#' ## Monitoring using Geyser data with alpha and c specified.
#' out <- tclustfsda(geyser2, k=3, restrfac=100, alpha=seq(0.10, 0, by=-0.01), monitoring=TRUE)
#'
#' ## Monitoring using Geyser data with plot argument specified as a list.
#' ## The trimming levels to consider in this case are 31 values of alpha
#' ##
#' out <- tclustfsda(geyser2, k=3, restrfac=100, alpha=seq(0.30, 0, by=-0.01), monitoring=TRUE,
#' plot=list(alphasel=c(0.2, 0.10, 0.05, 0.01)), trace=TRUE)
#'
#' ## Monitoring using Geyser data with argument UnitsSameGroup
#' ##
#' ## Make sure that group containing unit 10 is in a group which is labelled "group 1"
#' ## and group containing unit 12 is in group which is labelled "group 2"
#' ##
#' ## Mixture model is used (mixt=2)
#' ##
#' out <- tclustfsda(geyser2, k=3, restrfac=100, alpha=seq(0.30, 0, by=-0.01), monitoring=TRUE,
#' mixt=2, UnitsSameGroup=c(10, 12), trace=TRUE)
#'
#' ## Monitoring using M5 data
#' data(M5data)
#'
#' ## alphavec=vector which contains the trimming levels to consider
#' ## in this case 31 values of alpha are considered
#' alphavec <- seq(0.10, 0, by=-0.02)
#' out <- tclustfsda(M5data[, 1:2], 3, alpha=alphavec, restrfac=1000, monitoring=TRUE,
#' nsamp=1000, plots=TRUE)
#' }
#' @export
#' @author FSDA team, \email{valentin.todorov@@chello.at}
tclustfsda <- function(x, k, alpha, restrfactor=12, monitoring=FALSE, plot=FALSE,
nsamp, refsteps=15, reftol=10e-14, equalweights=FALSE, mixt=0, msg=FALSE, nocheck=FALSE,
startv1=1, RandNumbForNini, restrtype= c("eigen", "deter"),
UnitsSameGroup, numpool, cleanpool,
trace=FALSE, ...)
{
if(is.data.frame(x))
x <- data.matrix(x)
else if(!is.matrix(x))
x <- matrix(x, length(x), 1,
dimnames = list(names(x), deparse(substitute(x))))
if(!is.numeric(x)) stop("x is not a numeric")
storage.mode(x) <- "double"
dx <- dim(x)
xn <- (dnx <- dimnames(x))[[2]]
xn <- if (!is.null(xn))
xn
else if (dx[2] > 1)
paste("X", 1:dx[2], sep = "")
else if(dx[2])
"X"
dimnames(x) <- list(dnx[[1]], xn)
n <- nrow(x)
p <- ncol(x)
## We do not need a 'control' argument for now
## if(missing(control))
## control <- list(...)
control <- list(...)
xplots <- 0
if(is.logical(plot))
xplots <- ifelse(plot, 1, 0)
else if(is.numeric(plot) && plot >= 0 && plot <= 2)
xplots <- plot
else if(is.character(plot))
{
cx <- c("contourf", "contour", "ellipse", "boxplotb")
if(plot %in% cx)
xplots <- plot
else
stop("Invalid parameter 'plot'. If it is a character, it should contain one or more of the following: ", paste0(cx, collapse=","))
}else if(is.list(plot))
{
if(!monitoring)
{
cx <- c("type", "cmap", "conflev")
if(all(names(plot) %in% cx))
xplots <- plot
else
stop("Invalid parameter 'plot'. If it is a list, it should contain one or more of the following: ", paste0(cx, collapse=","))
} else {
cx <- c("name", "alphasel")
if(all(names(plot) %in% cx))
xplots <- plot
else
stop("Invalid parameter 'plot'. If it is a list, it should contain one or more of the following: ", paste0(cx, collapse=","))
}
## VT::30.07.2021 - fixed an issue with colormaps - check that the supploed argument plots$cmap is a matrix with three columns
if("cmap" %in% names(xplots))
{
xplots$cmap <- if(is.data.frame(xplots$cmap)) data.matrix(xplots$cmap) else xplots$cmap
if(!is.numeric(xplots$cmap) || !is.matrix(xplots$cmap) || ncol(xplots$cmap) != 3)
stop("'plot$cmap' must be a three-column matrix of values in the range [0,1] where each row is an RGB triplet that defines one color!")
}
}else
stop("Invalid parameter 'plot'. Should be TRUE/FALSE or 0, 1 or character name of a plot or a list with graphical parameters")
control$plots <- xplots
if(!missing(nsamp)){
control$nsamp <- nsamp
if(is.matrix(nsamp) && !missing(RandNumbForNini))
control$RandNumbForNini <- RandNumbForNini
}
control$refsteps <- refsteps
control$reftol <- reftol
control$equalweights <- equalweights
control$mixt <- mixt
if(!monitoring)
control$Ysave <- 1
if(monitoring)
{
if(!missing(UnitsSameGroup))
control$UnitsSameGroup <- UnitsSameGroup
if(!missing(numpool))
control$numpool <- numpool
if(!missing(cleanpool))
control$cleanpool <- ifelse(cleanpool, 1, 0)
}
xmsg <- 0
if(is.logical(msg))
xmsg <- ifelse(msg, 1, 0)
else if(is.numeric(msg) && msg >= 0 && msg <= 2)
xmsg <- msg
else
stop("Invalid parameter 'msg'. Should be TRUE/FALSE or 0, 1, 2.")
control$msg <- xmsg
xtemp <- 0
if(is.logical(nocheck))
xtemp <- ifelse(nocheck, 1, 0)
else if(is.numeric(nocheck) && nocheck >= 0 && nocheck <= 1)
xtemp <- nocheck
else
stop("Invalid parameter 'nocheck'. Should be TRUE/FALSE or 0, 1.")
control$nocheck <- xtemp
control$startv1 <- startv1
control$restrtype <- match.arg(restrtype)
outclass <- if(monitoring) "tclusteda" else "tclustfsda"
if(missing(alpha))
alpha <- if(!monitoring) 0 else c(0.1, 0.05, 0)
## ES 27.06.2018: parameters that are mandatory to the MATLAB function
## cannot be put into the MATLAB function because they have to be supplied
## to the function individually and not in (name, value) pairs
parlist = c(.jarray(x, dispatch=TRUE))
parlist = c(parlist, .jnew("java/lang/Double", as.double(k))) # k
if(length(alpha) == 1)
parlist = c(parlist, .jnew("java/lang/Double", as.double(alpha))) # alpha
else
parlist = c(parlist, .jarray(alpha, dispatch=TRUE))
parlist = c(parlist, .jnew("java/lang/Double", as.double(restrfactor))) # restrfactor
paramNames = names(control)
if(trace)
print(control)
if(length(paramNames) > 0)
{
for (i in 1:length(paramNames)) {
paramName = paramNames[i]
paramValue = control[[i]]
matlabValue = rType2MatlabType(paramName, paramValue)
parlist = c(parlist, .jnew("java/lang/String", paramName), matlabValue)
}
}
if(!monitoring)
{
out <- callFsdaFunction("tclust", "[Ljava/lang/Object;", 1, parlist)
if(is.null(out))
return(NULL)
arr1 = .jcast(out[[1]], "com/mathworks/toolbox/javabuilder/MWStructArray")
arr = .jnew("org/jrc/ipsc/globesec/sitaf/fsda/FsdaMWStructArray", arr1)
if(trace) {
cat("\nReturning from MATLAB tclust(). Fields returned by MATLAB: \n")
print(arr$fieldNames())
}
## we will reuse them for the elements of emp, if it is structure
## (in case of non-convergence)
xcolnames <- if(is.null(dimnames(x)[[2]])) paste0("X", 1:ncol(x)) else dimnames(x)[[2]]
xrownames <- paste0("C", 1:k)
muopt = .jevalArray(arr$get("muopt", as.integer(1)), "[[D", simplify = TRUE)
dimnames(muopt) <- list(xrownames, xcolnames)
muopt <- t(muopt)
sigmaopt = .jevalArray(arr$get("sigmaopt", as.integer(1)), "[[D", simplify = TRUE)
if(k == 1)
dimnames(sigmaopt) <- list(xcolnames, xcolnames)
else
dimnames(sigmaopt) <- list(xcolnames, xcolnames, xrownames)
idx = as.vector(as.matrix(.jevalArray(arr$get("idx", as.integer(1)), "[[D", simplify = TRUE)))
size = as.matrix(.jevalArray(arr$get("siz", as.integer(1)), "[[D", simplify = TRUE))
if(notconverged <- (nrow(size) < k + ifelse(alpha > 0, 1, 0))) {
cat("\nNumber of requested clusters =", k,
"\nNumber of estimated clusters =", nrow(size) - ifelse(alpha > 0, 1, 0), "\n")
warning("The total number of estimated clusters is smaller than the number supplied")
}
size_rows <- paste0("C", size[,1])
size_cols <- c("Cluster", "Size", "Percent")
dimnames(size) <- list(size_rows, size_cols)
postprob <- as.matrix(.jevalArray(arr$get("postprob", as.integer(1)), "[[D", simplify = TRUE))
rows <- if(is.null(dimnames(x)[[1]])) 1:nrow(x) else dimnames(x)[[1]]
cols <- paste0("C", 1:k)
dimnames(postprob) <- list(rows, cols)
MIXMIX <- if(as.integer(arr$hasField("MIXMIX", as.integer(1))) != 1) NULL
else as.vector(as.matrix(.jevalArray(arr$get("MIXMIX", as.integer(1)), "[[D", simplify = TRUE)))
MIXCLA <- if(as.integer(arr$hasField("MIXCLA", as.integer(1))) != 1) NULL
else as.vector(as.matrix(.jevalArray(arr$get("MIXCLA", as.integer(1)), "[[D", simplify = TRUE)))
CLACLA <- if(as.integer(arr$hasField("CLACLA", as.integer(1))) != 1) NULL
else as.vector(as.matrix(.jevalArray(arr$get("CLACLA", as.integer(1)), "[[D", simplify = TRUE)))
emp <- .jevalArray(arr$get("emp", as.integer(1)), "[[D", simplify = TRUE)
## VT::18.08.2022
## When the number of clusters estimated is less than the number
## of clusters requested, emp is not a double (emp=0) but is a structure,
## containing the empirical counterparts of
## idx, muopt, sigmaopt and size
if(typeof(emp) == "double")
emp <- as.vector(as.matrix(emp))
else {
emp <- unwrapComplexNumericCellArray(emp)
## a list with 4 elements:
## 1. idxemp (n-by-1 vector)
## 2. muemp (k-by-p matrix)
## 3. sigmaemp (p-by-p-by-k array)
## 4. sizemp (Matrix of size (k+1)-by-3)
idxemp <- as.vector(emp[[1]])
muemp <- matrix(emp[[2]], ncol=p)
dimnames(muemp) <- list(xrownames, xcolnames)
muemp <- t(muemp)
sigmaemp <- array(emp[[3]], dim=c(p, p, k))
if(k == 1)
dimnames(sigmaemp) <- list(xcolnames, xcolnames)
else
dimnames(sigmaemp) <- list(xcolnames, xcolnames, xrownames)
sizeemp <- matrix(emp[[4]], ncol=3)
size_rows <- paste0("C", 0:k)
size_cols <- c("Cluster", "Size", "Percent")
dimnames(sizeemp) <- list(size_rows, size_cols)
emp <- list(idxemp=idxemp, muemp=muemp, sigmaemp=sigmaemp, sizeemp=sizeemp)
}
notconver <- as.vector(as.matrix(.jevalArray(arr$get("notconver", as.integer(1)), "[[D", simplify = TRUE)))
bs <- as.vector(as.matrix(.jevalArray(arr$get("bs", as.integer(1)), "[[D", simplify = TRUE)))
obj <- as.vector(as.matrix(.jevalArray(arr$get("obj", as.integer(1)), "[[D", simplify = TRUE)))
equalweights <- as.vector(as.matrix(.jevalArray(arr$get("equalweights", as.integer(1)), "[[D", simplify = TRUE)))
h <- as.vector(as.matrix(.jevalArray(arr$get("h", as.integer(1)), "[[D", simplify = TRUE)))
fullsol <- as.vector(as.matrix(.jevalArray(arr$get("fullsol", as.integer(1)), "[[D", simplify = TRUE)))
Y <- if(as.integer(arr$hasField("Y", as.integer(1))) != 1) NULL
else as.matrix(.jevalArray(arr$get("Y", as.integer(1)), "[[D", simplify = TRUE))
ans <- list(call=match.call(), alpha=alpha, k=k, muopt=muopt, sigmaopt=sigmaopt, idx=idx, size=size, postprob=postprob, emp=emp,
MIXMIX=MIXMIX, MIXCLA=MIXCLA, CLACLA=CLACLA,
notconver=notconver, bs=bs, obj=obj, equalweights=equalweights, h=h, fullsol=fullsol, X=Y)
ans$MIXMIX <- MIXMIX
ans$MIXCLA <- MIXCLA
ans$CLACLA <- CLACLA
} else
{
out <- callFsdaFunction("tclusteda", "[Ljava/lang/Object;", 1, parlist)
if(is.null(out))
return(NULL)
arr1 = .jcast(out[[1]], "com/mathworks/toolbox/javabuilder/MWStructArray")
arr = .jnew("org/jrc/ipsc/globesec/sitaf/fsda/FsdaMWStructArray", arr1)
if(trace) {
cat("\nReturning from MATLAB tclusteda(). Fields returned by MATLAB: \n")
print(arr$fieldNames())
}
cols <- if(is.null(dimnames(x)[[2]])) paste0("X", 1:ncol(x)) else dimnames(x)[[2]]
rows <- if(is.null(dimnames(x)[[1]])) 1:nrow(x) else dimnames(x)[[1]]
alphas <- paste0("alpha_", alpha)
MU = .jevalArray(arr$get("MU", as.integer(1)), "[[D", simplify = TRUE)
## VT::27.07.2021 - if alpha is a single number, the array MU will be two-dimensional,
## not three-dimensional as expected by the following setting of dimnames. Change it
## to trhee-dimensional array, with the third dimension having just one element.
if(length(alpha) == 1)
MU <- array(MU, c(dim(MU), 1))
dimnames(MU) <- list(paste0("C", 1:k), cols, alphas)
## Convert a cell array with the length of alpha to a list. Each cell contains
## a 3D-array with dimension p x p x k
SIGMA = cell2list(.jevalArray(arr$get("SIGMA", as.integer(1)), "[[D", simplify = TRUE))
names(SIGMA) <- alphas
for(i in 1:length(SIGMA))
dimnames(SIGMA[[i]]) <- list(cols, cols, paste0("C", 1:k))
IDX = as.matrix(.jevalArray(arr$get("IDX", as.integer(1)), "[[D", simplify = TRUE))
dimnames(IDX) <- list(rows, paste0("alpha_", alpha))
## VT::27.07.2021 - if alpha is a single number, the matrix Amon contains 0 rows (i.e. length(alpha)-1).
## Actually the returned matrix is also with 0 columns. Therefore do not set the dimnames.
Amon = as.matrix(.jevalArray(arr$get("Amon", as.integer(1)), "[[D", simplify = TRUE))
if(nrow(Amon)[1] > 0)
dimnames(Amon) <- list(1:nrow(Amon), c("alpha", "ARI", "dist_centers", "dist_covs"))
Y <- if(as.integer(arr$hasField("Y", as.integer(1))) != 1) NULL
else as.matrix(.jevalArray(arr$get("Y", as.integer(1)), "[[D", simplify = TRUE))
ans = list(call=match.call(), alpha=alpha, k=k, restrfactor=restrfactor, MU=MU, SIGMA=SIGMA, IDX=IDX, Amon=Amon, X=Y)
}
freeMatlabResources(out)
class(ans) <- outclass
return (ans)
}
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.