sdf_sum_col: Sum_col method

Description Usage Arguments Value Examples

Description

This function will perform a function similar to a SQL Group By. It should be noted that it does not perform this identically to what you'd typically expect of an ANSI like SQL statement. A new column is added onto the returning data rather than automatically returning columns parameterised as part of the call. With this function you need to performa an additional select. Also only one SINGLE sum-by column can be used.

Usage

1
sdf_sum_col(sc, data, group_by_cols, sum_col_name)

Arguments

sc

A spark_connection.

data

A jobj: the Spark DataFrame on which to perform the function.

group_by_cols

c(String). A vector of columns to Group-By

sum_col_name

String.A column to Sum-By

Value

Returns a jobj

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
## Not run: 
# Set up a spark connection
sc <- spark_connect(master = "local", version = "2.2.0")

# Extract some data
lag_data <- spark_read_json(
  sc,
  "lag_data",
  path = system.file(
    "data_raw/lag_data.json",
    package = "sparkts"
  )
) %>%
  spark_dataframe()

# Call the method
p <- sdf_lag(
  sc = sc, data = lag_data, partition_cols = "id", order_cols = "t",
  target_col = "v", lag_num = 2L
)

# Return the data to R
p %>% dplyr::collect()

spark_disconnect(sc = sc)

## End(Not run)

nathaneastwood/sparkts documentation built on May 25, 2019, 10:34 p.m.