spark_query_data: Query a Spark DataFrame

Description Usage Arguments Details Value Examples

View source: R/spark_query_data.R

Description

Query a Spark DataFrame and optionally return the results to Spark memory or to R's memory.

Usage

1
spark_query_data(sc, qry, name, type = c("lazy", "compute", "collect"))

Arguments

sc

A spark_connection.

qry

A SQL query.

name

character(1). If not NULL, the resulting object will be registered within the Spark context under this name.

type

character(1). One of "lazy", "compute" or "collect". See details for more.

Details

This function differs depending on the type given by the user. There are three scenarios:

  1. The default, "lazy", is only evaluated, for example when the user collects the data (see sparklyr::collect()).

  2. "compute" ensures that the query is executed and the resulting data are stored within Spark's memory.

  3. "collect" executes the query and returns the resulting data to R's memory.

Value

One of two:

  1. A tbl_spark reference to a Spark DataFrame in the event type is "compute" or "lazy".

  2. A tibble in the event type is "collect".

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
## Not run: 
sc <- sparklyr::spark_connect(master = "local")
mtcars_spark <- sparklyr::copy_to(dest = sc, df = mtcars)

# By default, queries are executed lazily
spark_query_data(sc = sc, qry = "select mpg from mtcars")

# But we can cache the results
cache <- spark_query_data(
  sc = sc,
  qry = "select mpg from mtcars",
  name = "mpg_mtcars",
  type = "compute"
)
# And gather the results
spark_collect_data(x = "mpg_mtcars", sc = sc)

# Or we can collect the data instantly
spark_query_data(
  sc = sc,
  qry = "select disp from mtcars",
  type = "collect"
)

## End(Not run)

nathaneastwood/sparklio documentation built on March 16, 2021, 7:51 p.m.