predkmeansCVest: Cross-validation of Predictive K-means Clustering

Description Usage Arguments Details Author(s) See Also Examples

View source: R/functions_cv.R

Description

Performs cross-validation of predictive k-means clustering and cluster prediction.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
predkmeansCVest(
  X,
  R,
  K,
  cv.groups = 10,
  sigma2 = 0,
  sigma2fixed = FALSE,
  scale = TRUE,
  covarnames = colnames(R),
  PCA = FALSE,
  PCAcontrol = list(covarnames = colnames(R), ncomps = 5),
  TPRS = FALSE,
  TPRScontrol = list(df = 5, xname = "x", yname = "y"),
  returnAll = FALSE,
  ...
)

predkmeansCVpred(
  object,
  X = object$X,
  R = object$R,
  method = c("ML", "MixExp", "SVM"),
  ...
)

Arguments

X

Outcome data

R

Covariates. Coerced to data frame.

K

Number of clusters

cv.groups

A list providing the cross-validation groups for splitting the data. groups for splitting the data. Alternatively, a single number giving the number of groups into which the data are randomly split. A value of '0' implies leave-one-out. Defaults to 10.

sigma2

starting value of sigma2. Setting sigma2=0 and sigma2fixed=TRUE results in regular k-means clustering.

sigma2fixed

Logical indicating whether sigma2 should be held fixed. If FALSE, then sigma2 is estimated using Maximum Likelihood.

scale

Should the outcomes be re-scaled within each training group?

covarnames

Names of covariates to be included directly.

PCA

Logical indicator for whether PCA components should be computed from R.

PCAcontrol

Arguments passed to createPCAmodelmatrix. This includes ncomps.

TPRS

Logical indicator for whether thin-plate regression splines should be created and added to covariates.

TPRScontrol

Arguments passed to createTPRSmodelmatrix. This includes df.

returnAll

A list containing all nStarts solutions is included in the output.

...

Additional arguments passed to either predkmeans or the prediction method.

object

A predkmeansCVest object.

method

Character string indicating which prediciton method should be used. Optins are ML, MixExp, and SVM. See predictML for more information.

Details

These wrappers are designed to simplify cross-validation of a dataset. For models including thin-plate regression splines (TPRS) or principal component analysis (PCA) scores, these functions will re-evaluate the TPRS basis or PCA decomposition on each training set.

Author(s)

Joshua Keller

See Also

predkmeans, createPCAmodelmatrix, createTPRSmodelmatrix

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
n <- 200
r1 <- rnorm(n)
r2 <- rnorm(n)
u1 <- rbinom(n, size=1,prob=0)
cluster <- ifelse(r1<0, ifelse(u1, "A", "B"), ifelse(r2<0, "C", "D"))
mu1 <- c(A=2, B=2, C=-2, D=-2)
mu2 <- c(A=1, B=-1, C=-1, D=-1)
x1 <- rnorm(n, mu1[cluster], 4)
x2 <- rnorm(n, mu2[cluster], 4)
R <- model.matrix(~r1 + r2)
X <- cbind(x1, x2)
pkmcv <- predkmeansCVest(X=cbind(x1, x2),
                         R=R, K=4, nStarts=4, cv.groups= 5,
                         TPRS=FALSE, PCA=FALSE, covarnames=colnames(R))
pkmcv

Example output

Cross-validation fit for predictive k-means object with
     4 Clusters
     5 CV Groups
Model has:
     3 Covariates

predkmeans documentation built on Jan. 11, 2020, 9:29 a.m.