kparts: K-Partitions Clustering

Description Usage Arguments Details Value Note Author(s) Examples

View source: R/kparts.R

Description

Unsupervised vector partitioning.

Usage

1
2
kparts(x, y, parts, maxiter = 50, trials = 3, 
       nblind = FALSE, trialprint = TRUE, iterprint = FALSE)

Arguments

x

The numeric vector to be partitioned.

y

The numeric response variable vector used to partition vector x.

parts

The desired number of partitions.

maxiter

The maximum number of iterations allowed for each trial. If convergence does not occur, the trail will stop after the specified number of iterations is reached. The default is 50 iterations.

trials

The number of times the algorithm is run with new, randomly assigned partitions. The default number of trials is 3.

nblind

If TRUE, the algorithm will ignore the sum of squares within each unique value of x. The default is FALSE.

trialprint

If TRUE, the trial number and the sum of squares will print while the algorithm is running. The default is TRUE.

iterprint

If TRUE, the iteration number and sum of squares will print while the algorithm is running. The default is FALSE.

Details

kparts finds the best contiguous partitions for x by minimizing the sum of squares of y.

The sum of squares for a unique value of x cannot be partitioned, which has the effect of weighting unique values of x by the number observations at those values. Using nblind = "FALSE" cause kparts to ignore the number of observations and treat all x values as equally weighted.

kparts can take a long time to process datasets with large numbers of unique x values. To gain efficiency, pre-processing vector x by binning is recommended.

Value

partitions

A data frame naming the index of the partition and the range x over which the partition extends.

data

A data frame containing the partition index (parts), the unique values of x, the average of y and the range of the partition.

Note

In later versions, kparts will be updated to allow for a matrix of data as y input.

Author(s)

Robert P. Bronaugh

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
  # plot readmission rates against age. 
  data(ipadmits)
  attach(ipadmits)
  ipadmits.summary = data.frame("AvgReadmission" = tapply(ipadmits$isReadmission
                                                          ,ipadmits$Age
                                                          ,mean)
                                ,"AvgCost" = tapply(ipadmits$cost
                                                    ,ipadmits$Age
                                                    ,mean))
  plot(ipadmits.summary$AvgReadmission,xlab = "Age",ylab = "AvgReadmission")
  
  
  # find the best partitions of age against readmission rate. 
  # run kparts with 4 trials with 5 partitions
  kp = kparts(x = ipadmits$Age,y = ipadmits$isReadmission,parts = 5,trials = 4)
  # list value range for each partition
  kp$partitions
  plot(kp)
  # run with 7 partitions and ignore number of samples per age
  # when computing error
  kp = kparts(ipadmits$Age,ipadmits$isReadmission,parts = 7,trials = 5,nblind = TRUE)
  kp$partitions
  plot(kp)
  detach(ipadmits)

Example output

[1] "initial 0 31.8622643033014"
[1] "trial 1 23.4846119038102"
[1] "trial 2 26.3612561258832"
[1] "trial 3 25.6846006000232"
[1] "trial 4 26.3612561258832"
  parts range
1     1  0-16
2     2 17-40
3     3 41-60
4     4 61-64
5     5 65-75
[1] "initial 0 0.244634537844779"
[1] "trial 1 0.107224438599815"
[1] "trial 2 0.101127860027367"
[1] "trial 3 0.108074647778636"
[1] "trial 4 0.107224438599815"
[1] "trial 5 0.106360552694685"
  parts range
1     1  0-16
2     2 17-40
3     3 41-60
4     4 61-64
5     5 65-67
6     6 68-68
7     7 69-75

phalen documentation built on May 29, 2017, 4:22 p.m.