cross_join: Cross Join
In nathaneastwood/sparkplugs: Common Wrappers and Missing Functionality from 'sparklyr' and 'dplyr'

Description Usage Arguments Details Examples

View source: R/cross_join.R

The CROSS JOIN returns all combinations of x and y, i.e. the dataset which is the number of rows in the first dataset multiplied by the number of rows in the second dataset. This kind of result is called the Cartesian Product.

cross_join(
  x,
  y,
  copy = FALSE,
  suffix = c("_x", "_y"),
  ...,
  na_matches = c("never", "na")
)

## S3 method for class 'tbl_lazy'
cross_join(
  x,
  y,
  copy = FALSE,
  suffix = c("_x", "_y"),
  ...,
  na_matches = c("never", "na")
)

## S3 method for class 'data.frame'
cross_join(
  x,
  y,
  copy = FALSE,
  suffix = c("_x", "_y"),
  ...,
  na_matches = c("na", "never")
)

`x, y`	A pair of `tbl_spark`s or `data.frame`s.
`copy`	If `x` and `y` are not from the same data source, and `copy` is `TRUE`, then `y` will be copied into a temporary table in same database as `x`. `*_join()` will automatically run `ANALYZE` on the created table in the hope that this will make you queries as efficient as possible by giving more data to the query planner. This allows you to join tables across srcs, but it's potentially expensive operation so you must opt into it.
`suffix`	If there are non-joined duplicate variables in `x` and `y`, these suffixes will be added to the output to disambiguate them. Should be a character vector of length 2.
`...`	Other parameters passed onto methods.
`na_matches`	Should NA (NULL) values match one another? The default, "never", is how databases usually work. `"na"` makes the joins behave like the dplyr join functions, `merge()`, `match()`, and `%in%`.

From Spark 2.1 the prerequisite for using a cross join is that, spark.sql.crossJoin.enabled must be set to true, otherwise an exception will be thrown. Cartesian products are very slow. More importantly, they could consume a lot of memory and trigger an OOM. If the join type is not Inner, Spark SQL could use a Broadcast Nested Loop Join even if both sides of tables are not small enough. Thus, it also could cause lots of unwanted network traffic.

x <- data.frame(
  id = c("id1", "id2", "id3", "id4", "id5"),
  val = c(2, 7, 11, 13, 17),
  stringsAsFactors = FALSE
)
cross_join(x, x)

nathaneastwood/sparkplugs documentation built on Feb. 28, 2021, 4:57 p.m.

nathaneastwood/sparkplugs index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

nathaneastwood/sparkplugs
Common Wrappers and Missing Functionality from 'sparklyr' and 'dplyr'

cross_join: Cross Join
In nathaneastwood/sparkplugs: Common Wrappers and Missing Functionality from 'sparklyr' and 'dplyr'

Description

Usage

Arguments

Details

Examples

Related to cross_join in nathaneastwood/sparkplugs...

R Package Documentation

Browse R Packages

We want your feedback!

nathaneastwood/sparkplugs Common Wrappers and Missing Functionality from 'sparklyr' and 'dplyr'

cross_join: Cross Join In nathaneastwood/sparkplugs: Common Wrappers and Missing Functionality from 'sparklyr' and 'dplyr'

Description

Usage

Arguments

Details

Examples

Related to cross_join in nathaneastwood/sparkplugs...

R Package Documentation

Browse R Packages

We want your feedback!

nathaneastwood/sparkplugs
Common Wrappers and Missing Functionality from 'sparklyr' and 'dplyr'

cross_join: Cross Join
In nathaneastwood/sparkplugs: Common Wrappers and Missing Functionality from 'sparklyr' and 'dplyr'