Description Usage Arguments Value Examples
This function will perform a function similar to a SQL Group By. It should be noted that it does not perform this identically to what you'd typically expect of an ANSI like SQL statement. A new column is added onto the returning data rather than automatically returning columns parameterised as part of the call. With this function you need to performa an additional select. Also only one SINGLE sum-by column can be used.
1 | sdf_sum_col(sc, data, group_by_cols, sum_col_name)
|
sc |
A |
data |
A |
group_by_cols |
c(String). A vector of columns to Group-By |
sum_col_name |
String.A column to Sum-By |
Returns a jobj
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 | ## Not run:
# Set up a spark connection
sc <- spark_connect(master = "local", version = "2.2.0")
# Extract some data
lag_data <- spark_read_json(
sc,
"lag_data",
path = system.file(
"data_raw/lag_data.json",
package = "sparkts"
)
) %>%
spark_dataframe()
# Call the method
p <- sdf_lag(
sc = sc, data = lag_data, partition_cols = "id", order_cols = "t",
target_col = "v", lag_num = 2L
)
# Return the data to R
p %>% dplyr::collect()
spark_disconnect(sc = sc)
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.