spark_read_jdbc: Create spark_tbl from JDBC connection

Description Usage Arguments Details Value Examples

View source: R/read-write.R

Description

Create spark_tbl from JDBC connection

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
spark_read_jdbc(
  url,
  table,
  partition_col = NULL,
  lower_bound = NULL,
  upper_bound = NULL,
  num_partitions = 0L,
  predicates = list(),
  ...
)

Arguments

url

spring, JDBC database url of the form jdbc:subprotocol:subname

table

string, the name of the table in the external database

partition_col

string, the name of a column of numeric, date, or timestamp type that will be used for partitioning.

lower_bound

the minimum value of partition_by used to decide partition stride

upper_bound

the maximum value of partition_by used to decide partition stride

num_partitions

intteger, the number of partitions, This, along with lowerBound (inclusive), upperBound (exclusive), form partition strides for generated WHERE clause expressions used to split the column partitionColumn evenly. This defaults to SparkContext.defaultParallelism when unset.

predicates

list, conditions in the where clause; each one defines one partition should be in the form of a SQL query string, see example.

...

additional JDBC database connection named properties.

Details

For specifing partitioning, the following rules apply:

Value

a spark_tbl

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
## Not run: 
spark_session(sparkPackages=c("mysql:mysql-connector-java:5.1.48"))

url <- "jdbc:mysql://localhost:3306/databasename"
df <- spark_read_jdbc(url, "table", predicates = list("field <= 123"), user = "username")

df2 <- spark_read_jdbc(url, "table2", partition_by = "index", lower_bound = 0,
                       upper_bound = 10000, user = "username", password = "password")

spark_session_stop()

# postgres example

spark_session(sparkPackages=c("org.postgresql:postgresql:42.2.12"))

iris_jdbc <- spark_read_jdbc(url = "jdbc:postgresql://localhost/databasename",
                             table = "table",
                             driver = "org.postgresql.Driver")

spark_session_stop()


## End(Not run)

danzafar/tidyspark documentation built on Sept. 30, 2020, 12:19 p.m.