computeClusterSample: Random sample of clustered data
In teradata-aster-field/toaster: Big Data in-Database Analytics that Scales with Teradata Aster Distributed Platform

Description Usage Arguments Value See Also Examples

Random sample of clustered data

1 2	computeClusterSample(channel, km, sampleFraction, sampleSize, scaled = FALSE, includeId = TRUE, test = FALSE)

`channel`	connection object as returned by `odbcConnect`.
`km`	an object of class `"toakmeans"` obtained with `computeKmeans`.
`sampleFraction`	vector with one or more sample fractions to use in the sampling of data. Multiple fractions define sampling for each cluster in kmeans `km` object where vector length must be equal to the number of clusters.
`sampleSize`	vector with sample size (applies only when `sampleFraction` is missing). Multiple sizes define sampling for each cluster in kmeans `km` object where vector length must be equal to the number of clusters.
`scaled`	logical: indicates if original (default) or scaled data returned.
`includeId`	logical indicates if sample should include key attribute identifying each data point.
`test`	logical: if TRUE show what would be done, only (similar to parameter `test` in RODBC functions: sqlQuery and sqlSave).

computeClusterSample returns an object of class "toakmeans" (compatible with class "kmeans").

computeKmeans

if(interactive()){
# initialize connection to Lahman baseball database in Aster 
conn = odbcDriverConnect(connection="driver={Aster ODBC Driver};
                         server=<dbhost>;port=2406;database=<dbname>;uid=<user>;pwd=<pw>")
                         
km = computeKmeans(conn, "batting", centers=5, iterMax = 25,
                   aggregates = c("COUNT(*) cnt", "AVG(g) avg_g", "AVG(r) avg_r", "AVG(h) avg_h"),
                   id="playerid || '-' || stint || '-' || teamid || '-' || yearid", 
                   include=c('g','r','h'), scaledTableName='kmeans_test_scaled', 
                   centroidTableName='kmeans_test_centroids',
                   where="yearid > 2000")
km = computeClusterSample(conn, km, 0.01)
km
createClusterPairsPlot(km, title="Batters Clustered by G, H, R", ticks=FALSE)

# per cluster sample fractions
km = computeClusterSample(conn, km, c(0.01, 0.02, 0.03, 0.02, 0.01))
}

teradata-aster-field/toaster documentation built on May 31, 2019, 8:36 a.m.

teradata-aster-field/toaster index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

teradata-aster-field/toaster
Big Data in-Database Analytics that Scales with Teradata Aster Distributed Platform

computeClusterSample: Random sample of clustered data
In teradata-aster-field/toaster: Big Data in-Database Analytics that Scales with Teradata Aster Distributed Platform

Description

Usage

Arguments

Value

See Also

Examples

Related to computeClusterSample in teradata-aster-field/toaster...

R Package Documentation

Browse R Packages

We want your feedback!

teradata-aster-field/toaster Big Data in-Database Analytics that Scales with Teradata Aster Distributed Platform

computeClusterSample: Random sample of clustered data In teradata-aster-field/toaster: Big Data in-Database Analytics that Scales with Teradata Aster Distributed Platform

Description

Usage

Arguments

Value

See Also

Examples

Related to computeClusterSample in teradata-aster-field/toaster...

R Package Documentation

Browse R Packages

We want your feedback!

teradata-aster-field/toaster
Big Data in-Database Analytics that Scales with Teradata Aster Distributed Platform

computeClusterSample: Random sample of clustered data
In teradata-aster-field/toaster: Big Data in-Database Analytics that Scales with Teradata Aster Distributed Platform