Description Usage Arguments Details Examples
The CROSS JOIN
returns all combinations of x
and y
, i.e. the dataset which is the number of rows in the first
dataset multiplied by the number of rows in the second dataset. This kind of result is called the Cartesian Product.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 | cross_join(
x,
y,
copy = FALSE,
suffix = c("_x", "_y"),
...,
na_matches = c("never", "na")
)
## S3 method for class 'tbl_lazy'
cross_join(
x,
y,
copy = FALSE,
suffix = c("_x", "_y"),
...,
na_matches = c("never", "na")
)
## S3 method for class 'data.frame'
cross_join(
x,
y,
copy = FALSE,
suffix = c("_x", "_y"),
...,
na_matches = c("na", "never")
)
|
x, y |
A pair of |
copy |
If This allows you to join tables across srcs, but it's potentially expensive operation so you must opt into it. |
suffix |
If there are non-joined duplicate variables in |
... |
Other parameters passed onto methods. |
na_matches |
Should NA (NULL) values match one another?
The default, "never", is how databases usually work. |
From Spark 2.1 the prerequisite for using a cross join is that, spark.sql.crossJoin.enabled
must be set to true
,
otherwise an exception will be thrown. Cartesian products are very slow. More importantly, they could consume a lot
of memory and trigger an OOM. If the join type is not Inner
, Spark SQL could use a Broadcast Nested Loop Join even
if both sides of tables are not small enough. Thus, it also could cause lots of unwanted network traffic.
1 2 3 4 5 6 | x <- data.frame(
id = c("id1", "id2", "id3", "id4", "id5"),
val = c(2, 7, 11, 13, 17),
stringsAsFactors = FALSE
)
cross_join(x, x)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.