Description Usage Arguments Details Value Examples
Generates R and SQL scripts to call as user defined aggregation functions in Hive
1 2 3 4 5 | write_udaf_scripts(f, cluster_by, input_table, input_cols, input_classes,
output_table, output_cols, output_classes, base_name = "udaf",
include_script = NULL, overwrite_script = FALSE,
overwrite_table = FALSE, rows_per_chunk = 1000000L, sep = "'\\t'",
verbose = FALSE, try = FALSE, tmptable = "tmp")
|
f |
function which accepts a grouped data frame and returns a data frame |
cluster_by |
character name of column to |
input_table |
character name of table to be transformed, ie.
|
input_cols |
input column names. See |
input_classes |
character vector of classes for columns. See
|
output_table |
character name of table to |
output_cols |
character vector of columns that f will output |
base_name |
character base name of script to write ie. foo.R and foo.sql |
include_script |
character name of an R script to include in the generated script. This may contain supporting functions, for example. |
overwrite_script |
logical write over any existing scripts with
|
overwrite_table |
first call |
rows_per_chunk |
integer number of rows to process in each chunk. If this is too small, say 10, then the generated script will be slow. If this is too large, say 1 billion, then the R process may fail because it uses excessive memory. |
sep |
character field separator string |
verbose |
logical log messages to |
try |
logical If |
tmptable |
character name of temporary table in SQL query |
This approach splits the data based on the value of the column
cluster_by
. Each group of split data must be small enough to fit
in memory of the R process that runs it.
This function is relatively low level. It provides the foundation for something more advanced that knows and uses the schema of the database. Defaults were chosen to do the least destructive things possible, so they don't overwrite existing files and data.
Feedback:
Do I attempt to have consistency with similar funcs / packages? Ie. DBI package uses statement, lapply uses FUN
Alternatively I could use caps to denote SQL things, ie. CLUSTER_BY
scripts character vector containing generated scripts
1 | #write_udaf_scripts(...)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.