kparts: K-Partitions Clustering
In phalen: Phalen Algorithms and Functions

Description Usage Arguments Details Value Note Author(s) Examples

View source: R/kparts.R

Unsupervised vector partitioning.

1 2	kparts(x, y, parts, maxiter = 50, trials = 3, nblind = FALSE, trialprint = TRUE, iterprint = FALSE)

`x`	The numeric vector to be partitioned.
`y`	The numeric response variable vector used to partition vector `x`.
`parts`	The desired number of partitions.
`maxiter`	The maximum number of iterations allowed for each `trial`. If convergence does not occur, the `trail` will stop after the specified number of iterations is reached. The default is `50` iterations.
`trials`	The number of times the algorithm is run with new, randomly assigned partitions. The default number of `trials` is `3`.
`nblind`	If `TRUE`, the algorithm will ignore the sum of squares within each unique value of `x`. The default is `FALSE`.
`trialprint`	If `TRUE`, the `trial` number and the sum of squares will print while the algorithm is running. The default is `TRUE`.
`iterprint`	If `TRUE`, the iteration number and sum of squares will print while the algorithm is running. The default is `FALSE`.

kparts finds the best contiguous partitions for x by minimizing the sum of squares of y.

The sum of squares for a unique value of x cannot be partitioned, which has the effect of weighting unique values of x by the number observations at those values. Using nblind = "FALSE" cause kparts to ignore the number of observations and treat all x values as equally weighted.

kparts can take a long time to process datasets with large numbers of unique x values. To gain efficiency, pre-processing vector x by binning is recommended.

`partitions`	A data frame naming the index of the partition and the range `x` over which the partition extends.
`data`	A data frame containing the partition index (parts), the unique values of `x`, the average of `y` and the range of the partition.

In later versions, kparts will be updated to allow for a matrix of data as y input.

Robert P. Bronaugh

  # plot readmission rates against age. 
  data(ipadmits)
  attach(ipadmits)
  ipadmits.summary = data.frame("AvgReadmission" = tapply(ipadmits$isReadmission
                                                          ,ipadmits$Age
                                                          ,mean)
                                ,"AvgCost" = tapply(ipadmits$cost
                                                    ,ipadmits$Age
                                                    ,mean))
  plot(ipadmits.summary$AvgReadmission,xlab = "Age",ylab = "AvgReadmission")
  
  
  # find the best partitions of age against readmission rate. 
  # run kparts with 4 trials with 5 partitions
  kp = kparts(x = ipadmits$Age,y = ipadmits$isReadmission,parts = 5,trials = 4)
  # list value range for each partition
  kp$partitions
  plot(kp)
  # run with 7 partitions and ignore number of samples per age
  # when computing error
  kp = kparts(ipadmits$Age,ipadmits$isReadmission,parts = 7,trials = 5,nblind = TRUE)
  kp$partitions
  plot(kp)
  detach(ipadmits)

[1] "initial 0 31.8622643033014"
[1] "trial 1 23.4846119038102"
[1] "trial 2 26.3612561258832"
[1] "trial 3 25.6846006000232"
[1] "trial 4 26.3612561258832"
  parts range
1     1  0-16
2     2 17-40
3     3 41-60
4     4 61-64
5     5 65-75
[1] "initial 0 0.244634537844779"
[1] "trial 1 0.107224438599815"
[1] "trial 2 0.101127860027367"
[1] "trial 3 0.108074647778636"
[1] "trial 4 0.107224438599815"
[1] "trial 5 0.106360552694685"
  parts range
1     1  0-16
2     2 17-40
3     3 41-60
4     4 61-64
5     5 65-67
6     6 68-68
7     7 69-75