spark_query_data: Query a Spark DataFrame
In nathaneastwood/sparklio: Utility Functions to Extend 'sparklyr'\'s IO

Description Usage Arguments Details Value Examples

View source: R/spark_query_data.R

Query a Spark DataFrame and optionally return the results to Spark memory or to R's memory.

1	spark_query_data(sc, qry, name, type = c("lazy", "compute", "collect"))

`sc`	A `spark_connection`.
`qry`	A SQL query.
`name`	`character(1)`. If not `NULL`, the resulting object will be registered within the Spark context under this name.
`type`	`character(1)`. One of "lazy", "compute" or "collect". See details for more.

This function differs depending on the type given by the user. There are three scenarios:

The default, "lazy", is only evaluated, for example when the user collects the data (see sparklyr::collect()).
"compute" ensures that the query is executed and the resulting data are stored within Spark's memory.
"collect" executes the query and returns the resulting data to R's memory.

One of two:

A tbl_spark reference to a Spark DataFrame in the event type is "compute" or "lazy".
A tibble in the event type is "collect".

## Not run: 
sc <- sparklyr::spark_connect(master = "local")
mtcars_spark <- sparklyr::copy_to(dest = sc, df = mtcars)

# By default, queries are executed lazily
spark_query_data(sc = sc, qry = "select mpg from mtcars")

# But we can cache the results
cache <- spark_query_data(
  sc = sc,
  qry = "select mpg from mtcars",
  name = "mpg_mtcars",
  type = "compute"
)
# And gather the results
spark_collect_data(x = "mpg_mtcars", sc = sc)

# Or we can collect the data instantly
spark_query_data(
  sc = sc,
  qry = "select disp from mtcars",
  type = "collect"
)

## End(Not run)