spark_read_jdbc: Create spark_tbl from JDBC connection
In danzafar/tidyspark: A Tidy Interface to Spark

Description Usage Arguments Details Value Examples

Create spark_tbl from JDBC connection

spark_read_jdbc(
  url,
  table,
  partition_col = NULL,
  lower_bound = NULL,
  upper_bound = NULL,
  num_partitions = 0L,
  predicates = list(),
  ...
)

`url`	spring, JDBC database url of the form jdbc:subprotocol:subname
`table`	string, the name of the table in the external database
`partition_col`	string, the name of a column of numeric, date, or timestamp type that will be used for partitioning.
`lower_bound`	the minimum value of partition_by used to decide partition stride
`upper_bound`	the maximum value of partition_by used to decide partition stride
`num_partitions`	intteger, the number of partitions, This, along with lowerBound (inclusive), upperBound (exclusive), form partition strides for generated WHERE clause expressions used to split the column partitionColumn evenly. This defaults to SparkContext.defaultParallelism when unset.
`predicates`	list, conditions in the where clause; each one defines one partition should be in the form of a SQL query string, see example.
`...`	additional JDBC database connection named properties.

For specifing partitioning, the following rules apply:

For partition_by, lower_bound, upper_bound - these options must all be specified if any of them is specified. In addition, num_partitions must be specified.
These values describe how to partition the table when reading in parallel from multiple workers. partition_by must be a numeric column from the table in question. It can only be one column.
lower_bound and upper_bound are just used to decide the partition stride, not for filtering the rows in table. So all rows in the table will be partitioned and returned.
to filter out rows before reading, use the predicates argument

a spark_tbl

## Not run: 
spark_session(sparkPackages=c("mysql:mysql-connector-java:5.1.48"))

url <- "jdbc:mysql://localhost:3306/databasename"
df <- spark_read_jdbc(url, "table", predicates = list("field <= 123"), user = "username")

df2 <- spark_read_jdbc(url, "table2", partition_by = "index", lower_bound = 0,
                       upper_bound = 10000, user = "username", password = "password")

spark_session_stop()

# postgres example

spark_session(sparkPackages=c("org.postgresql:postgresql:42.2.12"))

iris_jdbc <- spark_read_jdbc(url = "jdbc:postgresql://localhost/databasename",
                             table = "table",
                             driver = "org.postgresql.Driver")

spark_session_stop()


## End(Not run)