sdf_lag: Calculate lag

Description Usage Arguments Value Examples

Description

Calculate lag

Usage

1
sdf_lag(sc, data, partition_cols, order_cols, target_col, lag_num)

Arguments

sc

A spark_connection.

data

A jobj: the Spark DataFrame on which to perform the function.

partition_cols

c(String). A vector of column(s) to partition on.

order_cols

c(String). A vector of column(s) to order on.

target_col

String. Column name to create a window over.

lag_num

Integer. The number of rows back from the current row from which to obtain a value.

Value

Returns a jobj

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
## Not run: 
# Set up a spark connection
sc <- spark_connect(master = "local", version = "2.2.0")

# Extract some data
lag_data <- spark_read_json(
  sc,
  "lag_data",
  path = system.file(
    "data_raw/lag_data.json",
    package = "sparkts"
  )
) %>%
  spark_dataframe()

# Call the method
p <- sdf_lag(
  sc = sc, data = lag_data, partition_cols = "id", order_cols = "t",
  target_col = "v", lag_num = 2L
)

# Return the data to R
p %>% dplyr::collect()

spark_disconnect(sc = sc)

## End(Not run)

nathaneastwood/sparkts documentation built on May 25, 2019, 10:34 p.m.