kmod: K-Means clustering with simultaneous Outlier Detection

Description Usage Arguments Value Examples

Description

K-Means clustering with simultaneous Outlier Detection

Usage

1
2
kmod(X, k = 5, l = 0, i_max = 100, conv_method = "delta_C",
  conv_error = 0, allow_empty_c = FALSE)

Arguments

X

matrix of numeric data or an object that can be coerced to such a matrix (such as a data frame with numeric columns only).

k

the number of clusters (default = 5)

l

the number of outliers (default = 0)

i_max

the maximum number of iterations permissible (default = 100)

conv_method

character: the method used to assess if kmod has converged (default = "delta_C")

conv_error

numeric: the tolerence permissible when assessing convergence (default = 0)

allow_empty_c

logical: set whether empty clusters are permissible (default = FALSE)

Value

kmod returns a list comprising the following components @return k the number of clusters specified

l the number of outliers specified

C the set of cluster centroids

C_sizes cluster sizes

C_ss the sum of squares for each cluster

L the set of outliers

L_dist_sqr the distance squares for each outlier to C

L_index the index of each outlier in the supplied dataset

XC_dist_sqr_assign the distance square and cluster assignment of each point in the supplied dataset

within_ss the within cluster sum of squares (excludes outliers)

between_ss the between cluster sum of squares

tot_ss the total sum of squares

iterations the number of iterations taken to converge

Examples

1
2
3
4
5
6
7
8
# a 2-dimensional example with 2 clusters and 5 outliers
x <- rbind(matrix(rnorm(100, sd = 0.3), ncol = 2),
           matrix(rnorm(100, mean = 1, sd = 0.3), ncol = 2))
colnames(x) <- c("x", "y")
(cl <- kmod(x, 2, 5))

# cluster a dataset with 8 clusters and 0 outliers
x <- kmod(x, 8)

Example output

Beginning k-means-- clustering
....
Clustering complete in  4  iterations.

k = 2   l = 5   Cluster sizes: 46 49 
(between cluster sum sqr / total sum sqr     = 82.78591 %)

Centroids: 
               x           y
[1,] -0.05609248 -0.09723774
[2,]  0.98343371  1.00405008

Outliers: 
              x          y
[1,]  0.8752664 -0.6946801
[2,]  1.8993725  1.0270935
[3,] -0.9282763 -0.2343806
[4,] -0.5193634 -0.7329309
[5,] -0.5618900 -0.4584473

Available components:
 [1] "k"                  "l"                  "C"                 
 [4] "C_sizes"            "C_ss"               "L"                 
 [7] "L_dist_sqr"         "L_index"            "XC_dist_sqr_assign"
[10] "within_ss"          "between_ss"         "tot_ss"            
[13] "iterations"        

$k
[1] 2

$l
[1] 5

$C
               x           y
[1,] -0.05609248 -0.09723774
[2,]  0.98343371  1.00405008

$C_sizes
[1] 46 49

$C_ss
[1] 5.953355 7.559446

$L
              x          y
[1,]  0.8752664 -0.6946801
[2,]  1.8993725  1.0270935
[3,] -0.9282763 -0.2343806
[4,] -0.5193634 -0.7329309
[5,] -0.5618900 -0.4584473

$L_dist_sqr
[1] 1.2243667 0.8394748 0.7795128 0.6187257 0.3863034

$L_index
[1] 22 60 48  4  2

$XC_dist_sqr_assign
           dist_sqr c
  [1,] 0.0174619713 1
  [2,] 0.3863034450 1
  [3,] 0.0510350771 1
  [4,] 0.6187256943 1
  [5,] 0.2123496343 1
  [6,] 0.0631150794 1
  [7,] 0.2711658005 1
  [8,] 0.1398713483 1
  [9,] 0.0063746995 1
 [10,] 0.1726652038 1
 [11,] 0.0627665781 1
 [12,] 0.0452525447 1
 [13,] 0.1115477920 1
 [14,] 0.0524501351 1
 [15,] 0.2423258178 1
 [16,] 0.2872651875 1
 [17,] 0.0314149554 1
 [18,] 0.0626799119 1
 [19,] 0.1992839631 1
 [20,] 0.2494919986 1
 [21,] 0.1024351733 1
 [22,] 1.2243667336 1
 [23,] 0.0009704824 1
 [24,] 0.1596808612 1
 [25,] 0.1213791804 1
 [26,] 0.3379599264 1
 [27,] 0.0220388909 1
 [28,] 0.1981850072 1
 [29,] 0.1406024493 1
 [30,] 0.0724570132 1
 [31,] 0.0430012272 1
 [32,] 0.3167515983 1
 [33,] 0.2625754978 1
 [34,] 0.1860232913 1
 [35,] 0.0039285297 1
 [36,] 0.0043924112 1
 [37,] 0.0429519414 1
 [38,] 0.0307407198 1
 [39,] 0.1584085947 1
 [40,] 0.1160153811 1
 [41,] 0.3444640387 1
 [42,] 0.0077451682 1
 [43,] 0.3241080281 1
 [44,] 0.1024131031 1
 [45,] 0.1947419446 1
 [46,] 0.0679943149 1
 [47,] 0.0432766290 1
 [48,] 0.7795127734 1
 [49,] 0.0756450708 1
 [50,] 0.1939510651 1
 [51,] 0.0568885441 2
 [52,] 0.0853768682 2
 [53,] 0.2876662505 2
 [54,] 0.1321487902 2
 [55,] 0.2179490768 2
 [56,] 0.2624273074 2
 [57,] 0.2641427537 2
 [58,] 0.0191193581 2
 [59,] 0.0478363451 2
 [60,] 0.8394748015 2
 [61,] 0.1174663943 2
 [62,] 0.3424735466 2
 [63,] 0.1835386630 2
 [64,] 0.1183812401 2
 [65,] 0.1179587203 2
 [66,] 0.1217068426 2
 [67,] 0.1229475470 2
 [68,] 0.0531727084 2
 [69,] 0.1662093667 2
 [70,] 0.1147906138 2
 [71,] 0.0896204252 2
 [72,] 0.0002095325 2
 [73,] 0.0288752792 2
 [74,] 0.0979613111 2
 [75,] 0.0568388391 2
 [76,] 0.2194717814 2
 [77,] 0.1340127911 2
 [78,] 0.3121382119 2
 [79,] 0.0417715276 2
 [80,] 0.0748737917 2
 [81,] 0.2954850683 2
 [82,] 0.0088967881 2
 [83,] 0.2183087136 2
 [84,] 0.1544490163 2
 [85,] 0.3610014627 2
 [86,] 0.1205724726 2
 [87,] 0.0276009657 2
 [88,] 0.0400814827 2
 [89,] 0.0065238653 2
 [90,] 0.2831030049 2
 [91,] 0.3389441769 2
 [92,] 0.2229551160 2
 [93,] 0.3445385967 2
 [94,] 0.0450438423 2
 [95,] 0.2016789157 2
 [96,] 0.3137012567 2
 [97,] 0.0118786208 2
 [98,] 0.0233546046 2
 [99,] 0.3049155623 2
[100,] 0.3484377049 2

$within_ss
[1] 13.5128

$between_ss
[1] 64.98568

$tot_ss
[1] 78.49848

$iterations
[1] 4

Beginning k-means-- clustering
................
Clustering complete in  16  iterations.

k = 8   l = 0   Cluster sizes: 12 12 7 26 11 5 11 16 
(between cluster sum sqr / total sum sqr     = 92.77911 %)

Centroids: 
               x           y
[1,]  1.34905862  1.15025557
[2,]  0.82763230  1.36734000
[3,]  0.02997621  0.34782039
[4,]  0.92182051  0.76978464
[5,]  0.24180201 -0.34624192
[6,]  0.13718732  0.10372941
[7,] -0.40465030 -0.44696438
[8,] -0.17618476 -0.05134345

Outliers: 
     x y

Available components:
 [1] "k"                  "l"                  "C"                 
 [4] "C_sizes"            "C_ss"               "L"                 
 [7] "L_dist_sqr"         "L_index"            "XC_dist_sqr_assign"
[10] "within_ss"          "between_ss"         "tot_ss"            
[13] "iterations"        

kmodR documentation built on May 2, 2019, 2:50 p.m.