sdf_duplicate_marker: This method will flag any duplicate records
In nathaneastwood/sparkts: Call Methods from the spark-ts Package

Description Usage Arguments Value Examples

This method adds a column to a dataframe containing duplicate markers.

1 2	sdf_duplicate_marker(sc, data, part_col, ord_col, new_column_name = "duplicate")

`sc`	A `spark_connection`.
`data`	A `jobj`: the Spark `DataFrame` on which to perform the function.
`part_col`	String(s). A vector of the column(s) to check for duplicates within.
`ord_col`	String(s). A list of the column(s) to order by.
`new_column_name`	A string. This is what the duplicate marker column is called, it can be defaulted to "duplicate".

Returns a jobj. * 0 = Duplicate * 1 = Not a Duplicate

## Not run: 
# Set up a spark connection
sc <- spark_connect(master = "local", version = "2.2.0")

# Extract some data
dup_data <- spark_read_json(
  sc,
  "std_data",
  path = system.file(
    "data_raw/DuplicateDataIn.json",
    package = "sparkts"
  )
) %>%
  spark_dataframe()

# Call the method
p <- sdf_duplicate_marker(
  sc, dup_data, part_col = "order", ord_col = "marker"
)

# Return the data to R
p %>% dplyr::collect()

spark_disconnect(sc = sc)

## End(Not run)