Description Usage Arguments Details Value See Also Examples
For table compute column statistics in Aster and augment data frame structure
obtained with sqlColumns
with columns containing computed statistics.
1 2 3 |
channel |
object as returned by |
tableName |
name of the table in Aster. |
include |
a vector of column names to include. Output never contains attributes other than in the list. |
except |
a vector of column names to exclude. Output never contains attributes from the list. |
modeValue |
logical indicates if mode values should be computed. Default is FALSE. |
percentiles |
list of percentiles (integers between 0 and 100) to collect (always collects 25th and 75th for IQR calculation). There is no penalty in specifying more percentiles as they get calculated in a single call for each column - no matter how many different values are requested. When FALSE then percentiles calculations are skipped and result wil have no percentile and IQR columns. |
where |
SQL WHERE clause limiting data from the table (use SQL as if in WHERE clause but omit keyword WHERE). |
mock |
logical: if TRUE returns pre-computed table statistics for tables |
parallel |
logical: enable parallel calls to Aster database. This option requires parallel
backend enabled and registered (see in examples). Parallel execution requires ODBC |
Computes columns statistics for all or specified table columns and adds them
to the data frame with basic ODBC table metadata obtained with sqlColumns
.
Computed statistics include counts of all, non-null, distinct values; statistical
summaries of maximum, minimum, mean, standard deviation, median (50th percentile), mode
(optional), interquartile range, and desired percentiles. Each computed statistic adds
a column to ODBC metadata data frame.
data frame returned by sqlColumns
with additional columns:
total row count - the same for each table column
distinct values count
not null count
minimum value (numerical data types only)
maximum value (numerical data types only)
mean (numerical data types only)
standard deviation (numerical data types only)
defaults: 0,5,10,25,50,75,90,95,100. Always adds percentiles 25, 50 (median), 75
interquartile range is the 1st Quartile subtracted from the 3rd Quartile
minimum string value (character data types only)
maximum string value (character data types only)
mode value (optional)
mode count (optional)
sqlColumns
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 | if(interactive()){
# initialize connection to Lahman baseball database in Aster
conn = odbcDriverConnect(connection="driver={Aster ODBC Driver};
server=<dbhost>;port=2406;database=<dbname>;uid=<user>;pwd=<pw>")
pitchingInfo = getTableSummary(channel=conn, 'pitching_enh')
# list all table columns
pitchingInfo$COLUMN_NAME
# compute statistics on subset of baseball data after 1999
battingInfo = getTableSummary(channel=conn, 'batting_enh',
where='yearid between 2000 and 2013')
# compute statistics for certain columns including each percentile from 1 to 99
pitchingInfo = getTableSummary(channel=conn, 'pitching_enh',
include=c('h', 'er', 'hr', 'bb', 'so'),
percentiles=seq(1,99))
# list data frame column names to see all computed statistics
names(pitchingInfo)
# compute statitics on all numeric columns except certain columns
teamInfo = getTableSummary(channel=conn, 'teams_enh',
include=getNumericColumns(sqlColumns(conn, 'teams_enh')),
except=c('lgid', 'teamid', 'playerid', 'yearid', 'decadeid'))
}
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.