Description Usage Arguments Value Examples
This method adds a column to a dataframe containing duplicate markers.
1 2 | sdf_duplicate_marker(sc, data, part_col, ord_col,
new_column_name = "duplicate")
|
sc |
A |
data |
A |
part_col |
String(s). A vector of the column(s) to check for duplicates within. |
ord_col |
String(s). A list of the column(s) to order by. |
new_column_name |
A string. This is what the duplicate marker column is called, it can be defaulted to "duplicate". |
Returns a jobj
.
* 0 = Duplicate
* 1 = Not a Duplicate
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 | ## Not run:
# Set up a spark connection
sc <- spark_connect(master = "local", version = "2.2.0")
# Extract some data
dup_data <- spark_read_json(
sc,
"std_data",
path = system.file(
"data_raw/DuplicateDataIn.json",
package = "sparkts"
)
) %>%
spark_dataframe()
# Call the method
p <- sdf_duplicate_marker(
sc, dup_data, part_col = "order", ord_col = "marker"
)
# Return the data to R
p %>% dplyr::collect()
spark_disconnect(sc = sc)
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.