computeClusterSample: Random sample of clustered data

Description Usage Arguments Value See Also Examples

Description

Random sample of clustered data

Usage

1
2
computeClusterSample(channel, km, sampleFraction, sampleSize, scaled = FALSE,
  includeId = TRUE, test = FALSE)

Arguments

channel

connection object as returned by odbcConnect.

km

an object of class "toakmeans" obtained with computeKmeans.

sampleFraction

vector with one or more sample fractions to use in the sampling of data. Multiple fractions define sampling for each cluster in kmeans km object where vector length must be equal to the number of clusters.

sampleSize

vector with sample size (applies only when sampleFraction is missing). Multiple sizes define sampling for each cluster in kmeans km object where vector length must be equal to the number of clusters.

scaled

logical: indicates if original (default) or scaled data returned.

includeId

logical indicates if sample should include key attribute identifying each data point.

test

logical: if TRUE show what would be done, only (similar to parameter test in RODBC functions: sqlQuery and sqlSave).

Value

computeClusterSample returns an object of class "toakmeans" (compatible with class "kmeans").

See Also

computeKmeans

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
if(interactive()){
# initialize connection to Lahman baseball database in Aster 
conn = odbcDriverConnect(connection="driver={Aster ODBC Driver};
                         server=<dbhost>;port=2406;database=<dbname>;uid=<user>;pwd=<pw>")
                         
km = computeKmeans(conn, "batting", centers=5, iterMax = 25,
                   aggregates = c("COUNT(*) cnt", "AVG(g) avg_g", "AVG(r) avg_r", "AVG(h) avg_h"),
                   id="playerid || '-' || stint || '-' || teamid || '-' || yearid", 
                   include=c('g','r','h'), scaledTableName='kmeans_test_scaled', 
                   centroidTableName='kmeans_test_centroids',
                   where="yearid > 2000")
km = computeClusterSample(conn, km, 0.01)
km
createClusterPairsPlot(km, title="Batters Clustered by G, H, R", ticks=FALSE)

# per cluster sample fractions
km = computeClusterSample(conn, km, c(0.01, 0.02, 0.03, 0.02, 0.01))
}

teradata-aster-field/toaster documentation built on May 31, 2019, 8:36 a.m.