spark_auto_broadcast_join_threshold: Retrieves or sets the auto broadcast join threshold

View source: R/spark_context_config.R

spark_auto_broadcast_join_thresholdR Documentation

Retrieves or sets the auto broadcast join threshold

Description

Configures the maximum size in bytes for a table that will be broadcast to all worker nodes when performing a join. By setting this value to -1 broadcasting can be disabled. Note that currently statistics are only supported for Hive Metastore tables where the command 'ANALYZE TABLE <tableName> COMPUTE STATISTICS noscan' has been run, and file-based data source tables where the statistics are computed directly on the files of data.

Usage

spark_auto_broadcast_join_threshold(sc, threshold = NULL)

Arguments

sc

A spark_connection.

threshold

Maximum size in bytes for a table that will be broadcast to all worker nodes when performing a join. Defaults to NULL to retrieve configuration entries.

See Also

Other Spark runtime configuration: spark_adaptive_query_execution(), spark_advisory_shuffle_partition_size(), spark_coalesce_initial_num_partitions(), spark_coalesce_min_num_partitions(), spark_coalesce_shuffle_partitions(), spark_session_config()


rstudio/sparklyr documentation built on Sept. 18, 2024, 6:10 a.m.