db_cluster_create: Create a Cluster
In brickster: R Toolkit for 'Databricks'

db_cluster_create

R Documentation

Create a Cluster

Description

Create a Cluster

Usage

db_cluster_create(
  name,
  spark_version,
  node_type_id,
  num_workers = NULL,
  autoscale = NULL,
  spark_conf = list(),
  cloud_attrs = aws_attributes(),
  driver_node_type_id = NULL,
  custom_tags = list(),
  init_scripts = list(),
  spark_env_vars = list(),
  autotermination_minutes = 120,
  log_conf = NULL,
  ssh_public_keys = NULL,
  driver_instance_pool_id = NULL,
  instance_pool_id = NULL,
  idempotency_token = NULL,
  enable_elastic_disk = TRUE,
  apply_policy_default_values = TRUE,
  enable_local_disk_encryption = TRUE,
  docker_image = NULL,
  policy_id = NULL,
  host = db_host(),
  token = db_token(),
  perform_request = TRUE
)

Arguments

`name`	Cluster name requested by the user. This doesn’t have to be unique. If not specified at creation, the cluster name will be an empty string.
`spark_version`	The runtime version of the cluster. You can retrieve a list of available runtime versions by using `db_cluster_runtime_versions()`.
`node_type_id`	The node type for the worker nodes. `db_cluster_list_node_types()` can be used to see available node types.
`num_workers`	Number of worker nodes that this cluster should have. A cluster has one Spark driver and `num_workers` executors for a total of `num_workers` + 1 Spark nodes.
`autoscale`	Instance of `cluster_autoscale()`.
`spark_conf`	Named list. An object containing a set of optional, user-specified Spark configuration key-value pairs. You can also pass in a string of extra JVM options to the driver and the executors via `spark.driver.extraJavaOptions` and `spark.executor.extraJavaOptions` respectively. E.g. `list("spark.speculation" = true, "spark.streaming.ui.retainedBatches" = 5)`.
`cloud_attrs`	Attributes related to clusters running on specific cloud provider. Defaults to `aws_attributes()`. Must be one of `aws_attributes()`, `azure_attributes()`, `gcp_attributes()`.
`driver_node_type_id`	The node type of the Spark driver. This field is optional; if unset, the driver node type will be set as the same value as `node_type_id` defined above. `db_cluster_list_node_types()` can be used to see available node types.
`custom_tags`	Named list. An object containing a set of tags for cluster resources. Databricks tags all cluster resources with these tags in addition to `default_tags`. Databricks allows at most 45 custom tags.
`init_scripts`	Instance of `init_script_info()`.
`spark_env_vars`	Named list. User-specified environment variable key-value pairs. In order to specify an additional set of `SPARK_DAEMON_JAVA_OPTS`, we recommend appending them to `⁠$SPARK_DAEMON_JAVA_OPTS⁠` as shown in the following example. This ensures that all default Databricks managed environmental variables are included as well. E.g. `{"SPARK_DAEMON_JAVA_OPTS": "$SPARK_DAEMON_JAVA_OPTS -Dspark.shuffle.service.enabled=true"}`
`autotermination_minutes`	Automatically terminates the cluster after it is inactive for this time in minutes. If not set, this cluster will not be automatically terminated. If specified, the threshold must be between 10 and 10000 minutes. You can also set this value to 0 to explicitly disable automatic termination. Defaults to 120.
`log_conf`	Instance of `cluster_log_conf()`.
`ssh_public_keys`	List. SSH public key contents that will be added to each Spark node in this cluster. The corresponding private keys can be used to login with the user name ubuntu on port 2200. Up to 10 keys can be specified.
`driver_instance_pool_id`	ID of the instance pool to use for the driver node. You must also specify `instance_pool_id`. Optional.
`instance_pool_id`	ID of the instance pool to use for cluster nodes. If `driver_instance_pool_id` is present, `instance_pool_id` is used for worker nodes only. Otherwise, it is used for both the driver and worker nodes. Optional.
`idempotency_token`	An optional token that can be used to guarantee the idempotency of cluster creation requests. If an active cluster with the provided token already exists, the request will not create a new cluster, but it will return the ID of the existing cluster instead. The existence of a cluster with the same token is not checked against terminated clusters. If you specify the idempotency token, upon failure you can retry until the request succeeds. Databricks guarantees that exactly one cluster will be launched with that idempotency token. This token should have at most 64 characters.
`enable_elastic_disk`	When enabled, this cluster will dynamically acquire additional disk space when its Spark workers are running low on disk space.
`apply_policy_default_values`	Boolean (Default: `TRUE`), whether to use policy default values for missing cluster attributes.
`enable_local_disk_encryption`	Boolean (Default: `TRUE`), whether encryption of disks locally attached to the cluster is enabled.
`docker_image`	Instance of `docker_image()`.
`policy_id`	String, ID of a cluster policy.
`host`	Databricks workspace URL, defaults to calling `db_host()`.
`token`	Databricks workspace token, defaults to calling `db_token()`.
`perform_request`	If `TRUE` (default) the request is performed, if `FALSE` the httr2 request is returned without being performed.

Details

Create a new Apache Spark cluster. This method acquires new instances from the cloud provider if necessary. This method is asynchronous; the returned cluster_id can be used to poll the cluster state (db_cluster_get()). When this method returns, the cluster is in a PENDING state. The cluster is usable once it enters a RUNNING state.

Databricks may not be able to acquire some of the requested nodes, due to cloud provider limitations or transient network issues. If Databricks acquires at least 85% of the requested on-demand nodes, cluster creation will succeed. Otherwise the cluster will terminate with an informative error message.

Cannot specify both autoscale and num_workers, must choose one.

brickster
R Toolkit for 'Databricks'

db_cluster_create: Create a Cluster
In brickster: R Toolkit for 'Databricks'

Create a Cluster

Description

Usage

Arguments

Details

See Also

Related to db_cluster_create in brickster...

R Package Documentation

Browse R Packages

We want your feedback!

brickster R Toolkit for 'Databricks'

db_cluster_create: Create a Cluster In brickster: R Toolkit for 'Databricks'

Create a Cluster

Description

Usage

Arguments

Details

See Also

Related to db_cluster_create in brickster...

R Package Documentation

Browse R Packages

We want your feedback!

brickster
R Toolkit for 'Databricks'

db_cluster_create: Create a Cluster
In brickster: R Toolkit for 'Databricks'