spark_session: Get or create a SparkSession

Description Usage Arguments Details Examples

View source: R/spark_new.R

Description

SparkSession is the entry point into Spark. spark_session gets the existing SparkSession or initializes a new SparkSession. Additional Spark properties can be set in ..., and these named parameters take priority over values in master, app_name, named lists of spark_config.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
spark_session_reset(
  master = "",
  app_name = "SparkR",
  spark_home = Sys.getenv("SPARK_HOME"),
  spark_config = list(),
  spark_jars = "",
  spark_packages = "",
  enable_hive_support = TRUE,
  ...
)

spark_session(
  master = "",
  app_name = "tidyspark",
  spark_home = Sys.getenv("SPARK_HOME"),
  spark_config = list(),
  spark_jars = "",
  spark_packages = "",
  enable_hive_support = TRUE,
  verbose = F,
  ...
)

Arguments

master

string, the Spark master URL.

app_name

string, application name to register with cluster manager.

spark_home

string, Spark Home directory.

spark_config

named list of Spark configuration to set on worker nodes.

spark_jars

string vector of jar files to pass to the worker nodes.

spark_packages

string vector of package coordinates

enable_hive_support

enable support for Hive, fallback if not built with Hive support; once set, this cannot be turned off on an existing session

...

named Spark properties passed to the method.

verbose

boolean, whether to display startup messages. Default F

Details

spark_session_reset will first stop the existing session and then run spark_session.

When called in an interactive session, this method checks for the Spark installation, and, if not found, it will be downloaded and cached automatically. Alternatively, install.spark can be called manually.

For details on how to initialize and use Spark, refer to SparkR programming guide at http://spark.apache.org/docs/latest/sparkr.html#starting-up-sparksession.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
## Not run: 
spark_session()
df <- spark_read_json(path)

spark_session("local[2]", "SparkR", "/home/spark")
spark_session("yarn-client", "SparkR", "/home/spark",
               list(spark.executor.memory="4g"),
               c("one.jar", "two.jar", "three.jar"),
               c("com.databricks:spark-avro_2.11:2.0.1"))
spark_session(spark.master = "yarn-client", spark.executor.memory = "4g")


## End(Not run)

danzafar/tidyspark documentation built on Sept. 30, 2020, 12:19 p.m.