SparkSession: The 'SparkSession' Class

Description Details Public fields Methods Examples

Description

This class was designed as a thin wrapper around Spark's SparkSession. It is initialized when spark_submit is called. Note, running. sc$stop will end your session. For information on methods and types requirements, refer to the Javadoc: https://spark.apache.org/docs/2.3.0/api/java/org/apache/spark/sql/SparkSession.html

Details

Not all methods are implemented due to compatability and tidyspark best practice usage conflicts. If you need to use a method not included, try calling it using call_method(sc$jobj, <yourMethod>).

Public fields

jobj

SparkSession java object

conf

get the RuntimeConfig

sparkContext

the sparkContext associated with the session

Methods

Public methods


Method new()

Create a new SparkSession

Usage
SparkSession$new(session_jobj)
Arguments
session_jobj

the session's jobj


Method print()

print SparkSession

Usage
SparkSession$print()

Method close()

Stop the underlying SparkContext.

Usage
SparkSession$close()

Method emptyDataFrame()

Returns a DataFrame with no rows or columns. Range

Usage
SparkSession$emptyDataFrame()

Method range()

Creates a Dataset with a single LongType column named id, containing elements in a range from start to end (exclusive) with a step value, with partition number specified.

Usage
SparkSession$range(start = 0, end, step = NULL, numPartitions = NULL)
Arguments
start

integer, starting value

end

integer, ending value

step

integer, the number of steps

numPartitions

integer, the target number of partitions

Returns

a spark_tbl SQL


Method sql()

Executes a SQL query using Spark, returning the result as a DataFrame. The dialect that is used for SQL parsing can be configured with 'spark.sql.dialect'.

Usage
SparkSession$sql(sqlText)
Arguments
sqlText

string, a SQL query

Table


Method table()

Returns the specified table/view as a DataFrame.

Usage
SparkSession$table(tableName)
Arguments
tableName

is either a qualified or unqualified name that designates a table or view. If a database is specified, it identifies the table/view from the database. Otherwise, it first attempts to find a temporary view with the given name and then match the table/view from the current database. Note that, the global temporary view database is also valid here.

Returns

a spark_tbl

Version


Method version()

The version of Spark on which this application is running.

Usage
SparkSession$version()

Method clone()

The objects of this class are cloneable with this method.

Usage
SparkSession$clone(deep = FALSE)
Arguments
deep

Whether to make a deep clone.

Examples

1
2
3
4
5
6
7
8
9
## Not run: 

spark <- spark_session()
rdd <- spark$range(1, 10)
rdd$collect()

spark_session_stop()

## End(Not run)

danzafar/tidyspark documentation built on Sept. 30, 2020, 12:19 p.m.