Description Usage Arguments Details Value Examples
Create spark_tbl from JDBC connection
1 2 3 4 5 6 7 8 9 10 |
url |
spring, JDBC database url of the form jdbc:subprotocol:subname |
table |
string, the name of the table in the external database |
partition_col |
string, the name of a column of numeric, date, or timestamp type that will be used for partitioning. |
lower_bound |
the minimum value of partition_by used to decide partition stride |
upper_bound |
the maximum value of partition_by used to decide partition stride |
num_partitions |
intteger, the number of partitions, This, along with lowerBound (inclusive), upperBound (exclusive), form partition strides for generated WHERE clause expressions used to split the column partitionColumn evenly. This defaults to SparkContext.defaultParallelism when unset. |
predicates |
list, conditions in the where clause; each one defines one partition should be in the form of a SQL query string, see example. |
... |
additional JDBC database connection named properties. |
For specifing partitioning, the following rules apply:
For partition_by, lower_bound, upper_bound - these options must all be specified if any of them is specified. In addition, num_partitions must be specified.
These values describe how to partition the table when reading in parallel from multiple workers. partition_by must be a numeric column from the table in question. It can only be one column.
lower_bound and upper_bound are just used to decide the partition stride, not for filtering the rows in table. So all rows in the table will be partitioned and returned.
to filter out rows before reading, use the predicates
argument
a spark_tbl
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 | ## Not run:
spark_session(sparkPackages=c("mysql:mysql-connector-java:5.1.48"))
url <- "jdbc:mysql://localhost:3306/databasename"
df <- spark_read_jdbc(url, "table", predicates = list("field <= 123"), user = "username")
df2 <- spark_read_jdbc(url, "table2", partition_by = "index", lower_bound = 0,
upper_bound = 10000, user = "username", password = "password")
spark_session_stop()
# postgres example
spark_session(sparkPackages=c("org.postgresql:postgresql:42.2.12"))
iris_jdbc <- spark_read_jdbc(url = "jdbc:postgresql://localhost/databasename",
table = "table",
driver = "org.postgresql.Driver")
spark_session_stop()
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.