Description Usage Arguments Details Value See Also Examples
View source: R/computeKmeans.R
Kmeans clustering algorithm runs indatabase, returns object compatible with kmeans
and
includes arbitrary aggregate metrics computed on resulting clusters.
1 2 3 4 5 6 7  computeKmeans(channel, tableName, centers, threshold = 0.0395, iterMax = 10,
tableInfo, id, include = NULL, except = NULL,
aggregates = "COUNT(*) cnt", scale = TRUE, persist = FALSE,
idAlias = gsub("[^09azAZ]+", "_", id), where = NULL,
scaledTableName = NULL, centroidTableName = NULL,
clusteredTableName = NULL, tempTableName = NULL, schema = NULL,
test = FALSE, version = "6.21")

channel 
connection object as returned by 
tableName 
Aster table name. This argument is ignored if 
centers 
either the number of clusters, say 
threshold 
the convergence threshold. When the centroids move by less than this amount, the algorithm has converged. 
iterMax 
the maximum number of iterations the algorithm will run before quitting if the convergence threshold has not been met. 
tableInfo 
prebuilt summary of data to use (require when 
id 
column name or SQL expression containing unique table key. This argument is ignored if 
include 
a vector of column names with variables (must be numeric). Model never contains variables other than in the list.
This argument is ignored if 
except 
a vector of column names to exclude from variables. Model never contains variables from the list.
This argument is ignored if 
aggregates 
vector with SQL aggregates that define arbitrary aggreate metrics to be computed on each cluster
after running kmeans. Aggregates may have optional aliases like in 
scale 
logical if TRUE then scale each variable indatabase before clustering. Scaling performed results in 0 mean and unit
standard deviation for each of input variables. when 
persist 
logical if TRUE then function saves clustered data in the table 
idAlias 
SQL alias for table id. This is required when SQL expression is given for 
where 
specifies criteria to satisfy by the table rows before applying
computation. The creteria are expressed in the form of SQL predicates (inside

scaledTableName 
the name of the Aster table with results of scaling. This argument is ignored if 
centroidTableName 
the name of the Aster table with centroids found by kmeans. 
clusteredTableName 
the name of the Aster table in which to store the clustered output. If omitted
and argument 
tempTableName 
name of the temporary Aster table to use to store intermediate results. This table always gets dropped when function executes successfully. 
schema 
name of Aster schema that tables 
test 
logical: if TRUE show what would be done, only (similar to parameter 
version 
version of Aster Analytics Foundation functions applicable when 
The function fist scales notnull data (if scale=TRUE
) or just removes data with NULL
s without scaling.
After that the data given (table tableName
with option of filering with where
) are clustered by the
kmeans in Aster. Next, all standard metrics of kmeans clusters plus additional aggregates provided with
aggregates
are calculated again indatabase.
computeKmeans
returns an object of class "toakmeans"
(compatible with class "kmeans"
).
It is a list with at least the following components:
cluster
A vector of integers (from 0:K1) indicating the cluster to which each point is allocated.
computeKmeans
leaves this component empty. Use function computeClusterSample
to set this compoenent.
centers
A matrix of cluster centres.
totss
The total sum of squares.
withinss
Vector of withincluster sum of squares, one component per cluster.
tot.withinss
Total withincluster sum of squares, i.e. sum(withinss)
.
betweenss
The betweencluster sum of squares, i.e. totsstot.withinss
.
size
The number of points in each cluster. These includes all points in the Aster table specified that
satisfy optional where
condition.
iter
The number of (outer) iterations.
ifault
integer: indicator of a possible algorithm problem (always 0).
scale
logical: indicates if variable scaling was performed before clustering.
persist
logical: indicates if clustered data was saved in the table.
aggregates
Vectors (dataframe) of aggregates computed on each cluster.
tableName
Aster table name containing data for clustering.
columns
Vector of column names with variables used for clustering.
scaledTableName
Aster table containing scaled data for clustering.
centroidTableName
Aster table containing cluster centroids.
clusteredTableName
Aster table containing clustered output.
id
Column name or SQL expression containing unique table key.
idAlias
SQL alias for table id.
whereClause
SQL WHERE
clause expression used (if any).
time
An object of class proc_time
with user, system, and total elapsed times
for the computeKmeans
function call.
computeClusterSample
, computeSilhouette
, computeCanopy
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39  if(interactive()){
# initialize connection to Lahman baseball database in Aster
conn = odbcDriverConnect(connection="driver={Aster ODBC Driver};
server=<dbhost>;port=2406;database=<dbname>;uid=<user>;pwd=<pw>")
km = computeKmeans(conn, "batting", centers=5, iterMax = 25,
aggregates = c("COUNT(*) cnt", "AVG(g) avg_g", "AVG(r) avg_r", "AVG(h) avg_h"),
id="playerid  ''  stint  ''  teamid  ''  yearid",
include=c('g','r','h'), scaledTableName='kmeans_test_scaled',
centroidTableName='kmeans_test_centroids',
where="yearid > 2000")
km
createCentroidPlot(km)
createClusterPlot(km)
# persist clustered data
kmc = computeKmeans(conn, "batting", centers=5, iterMax = 250,
aggregates = c("COUNT(*) cnt", "AVG(g) avg_g", "AVG(r) avg_r", "AVG(h) avg_h"),
id="playerid  ''  stint  ''  teamid  ''  yearid",
include=c('g','r','h'),
persist = TRUE,
scaledTableName='kmeans_test_scaled',
centroidTableName='kmeans_test_centroids',
clusteredTableName = 'kmeans_test_clustered',
tempTableName = 'kmeans_test_temp',
where="yearid > 2000")
createCentroidPlot(kmc)
createCentroidPlot(kmc, format="bar_dodge")
createCentroidPlot(kmc, format="heatmap", coordFlip=TRUE)
createClusterPlot(kmc)
kmc = computeClusterSample(conn, kmc, 0.01)
createClusterPairsPlot(kmc, title="Batters Clustered by G, H, R", ticks=FALSE)
kmc = computeSilhouette(conn, kmc)
createSilhouetteProfile(kmc, title="Cluster Silhouette Histograms (Profiles)")
}

Loading required package: RODBC
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.