cd.cluster: Cluster analysis for cognitive diagnosis based on the...

View source: R/cd.cluster.R

cd.clusterR Documentation

Cluster analysis for cognitive diagnosis based on the Asymptotic Classification Theory

Description

cd.cluster is used to classify examinees into unlabeled clusters based on cluster analysis. Available options include K-means and Hierarchical Agglomerative Cluster Analysis (HACA) with various links.

Usage

cd.cluster (Y, Q, method = c("HACA", "Kmeans"), Kmeans.centers = NULL,
Kmeans.itermax = 10, Kmeans.nstart = 1, HACA.link = c("complete", "ward", "single",
 "average", "mcquitty", "median", "centroid"), HACA.cut = NULL)

Arguments

Y

A required N \times J response matrix with binary elements (1=correct, 0=incorrect), where N is the number of examinees and J is the number of items.

Q

A required J \times K binary item-by-attribute association matrix (Q-matrix), where K is the number of attributes. The j^{th} row of the matrix is an indicator vector, 1 indicating attributes are required and 0 indicating attributes are not required to master item j.

method

The clustering algorithm used to classify data. Two options are available, including "Kmeans" and "HACA", where "HACA" is the default method.

Kmeans.centers

The number of clusters when "Kmeans" argument is selected. It must be not less than 2 and not greater than 2^K where K is the number of attributes. The default is 2^K.

Kmeans.itermax

The maximum number of iterations allowed when "Kmeans" argument is selected.

Kmeans.nstart

The number of random sets to be chosen when "Kmeans" argument is selected.

HACA.link

The link to be used with HACA. It must be one of "ward", "single", "complete", "average", "mcquitty", "median" or "centroid". The default "HACA.link" is "complete".

HACA.cut

The number of clusters when "HACA" argument is specified. It must be not less than 2 and not greater than 2^K, where K is the number of attributes. The default is 2^K.

Details

Based on the Asymptotic Classification Theory (Chiu, Douglas & Li, 2009), A sample statistic \bm{W} (See ACTCD) is calculated using the response matrix and Q-matrix provided by the users and then taken as the input for cluster analysis (i.e. K-means and HACA).

The number of latent clusters can be specified by the users in Kmeans.centers or HACA.cut. It must be not less than 2 and not greater than 2^K, where K is the number of attributes. Note that if the number of latent clusters is less than the default value (2^K), the clusters cannot be labeled in labeling using method="1" and method="3" algorithms. See labeling for more information.

Value

W

The N \times K sample statistic \bm{W} for the clustering algorithm. See details for more information.

size

A set of integers, indicating the sizes of latent clusters.

mean.w

A matrix of cluster centers, representing the average \bm{W} of the latent clusters.

wss.w

The vector of within-cluster sum of squares of \bm{W}.

sqmwss.w

The vector of square root of mean of within-cluster sum of squares of \bm{W}.

mean.y

The vector of the mean of sum scores of the clusters.

class

The vector of estimated memberships for examinees.

References

Chiu, C. Y., Douglas, J. A., & Li, X. (2009). Cluster analysis for cognitive diagnosis: theory and applications. Psychometrika, 74(4), 633-665.

See Also

print.cd.cluster, labeling, npar.CDM, ACTCD

Examples

# Classification based on the simulated data and Q matrix
data(sim.dat)
data(sim.Q)
# Information about the dataset
N <- nrow(sim.dat) #number of examinees
J <- nrow(sim.Q) #number of items
K <- ncol(sim.Q) #number of attributes

#the default number of latent clusters is 2^K
cluster.obj <- cd.cluster(sim.dat, sim.Q)
#cluster size
sizeofc <- cluster.obj$size
#W statistics
W <- cluster.obj$W

#User-specified number of latent clusters
M <- 5  # the number of clusters is fixed to 5
cluster.obj <- cd.cluster(sim.dat, sim.Q, method="HACA", HACA.cut=M) 
#cluster size
sizeofc <- cluster.obj$size
#W statistics
W <- cluster.obj$W

M <- 5 # the number of clusters is fixed to 5
cluster.obj <- cd.cluster(sim.dat, sim.Q, method="Kmeans", Kmeans.centers =M)  
#cluster size
sizeofc <- cluster.obj$size
#W statistics
W <- cluster.obj$W

ACTCD documentation built on Nov. 10, 2023, 1:12 a.m.