SparkJarProcessor: SparkJarProcessor Class
In DyfanJones/sagemaker-r-mlframework: sagemaker machine learning developed by amazon

SparkJarProcessor

R Documentation

SparkJarProcessor Class

Description

Handles Amazon SageMaker processing tasks for jobs using Spark with Java or Scala Jars.

Super classes

sagemaker.common::Processor -> sagemaker.common::ScriptProcessor -> sagemaker.mlframework::.SparkProcessorBase -> SparkJarProcessor

Methods

Inherited methods

Method `new()`

Initialize a “SparkJarProcessor“ instance. The SparkProcessor handles Amazon SageMaker processing tasks for jobs using SageMaker Spark.

Usage

SparkJarProcessor$new(
  role,
  instance_type,
  instance_count,
  framework_version = NULL,
  py_version = NULL,
  container_version = NULL,
  image_uri = NULL,
  volume_size_in_gb = 30,
  volume_kms_key = NULL,
  output_kms_key = NULL,
  max_runtime_in_seconds = NULL,
  base_job_name = NULL,
  sagemaker_session = NULL,
  env = NULL,
  tags = NULL,
  network_config = NULL
)

Arguments

role: (str): An AWS IAM role name or ARN. The Amazon SageMaker training jobs and APIs that create Amazon SageMaker endpoints use this role to access training data and model artifacts. After the endpoint is created, the inference code might use the IAM role, if it needs to access an AWS resource.
instance_type: (str): Type of EC2 instance to use for processing, for example, 'ml.c4.xlarge'.
instance_count: (int): The number of instances to run the Processing job with. Defaults to 1.
framework_version: (str): The version of SageMaker PySpark.
py_version: (str): The version of python.
container_version: (str): The version of spark container.
image_uri: (str): The container image to use for training.
volume_size_in_gb: (int): Size in GB of the EBS volume to use for storing data during processing (default: 30).
volume_kms_key: (str): A KMS key for the processing volume.
output_kms_key: (str): The KMS key id for all ProcessingOutputs.
max_runtime_in_seconds: (int): Timeout in seconds. After this amount of time Amazon SageMaker terminates the job regardless of its current status.
base_job_name: (str): Prefix for processing name. If not specified, the processor generates a default job name, based on the training image name and current timestamp.
sagemaker_session: (sagemaker.session.Session): Session object which manages interactions with Amazon SageMaker APIs and any other AWS services needed. If not specified, the processor creates one using the default AWS configuration chain.
env: (dict): Environment variables to be passed to the processing job.
tags: ([dict]): List of tags to be passed to the processing job.
network_config: (sagemaker.network.NetworkConfig): A NetworkConfig object that configures network isolation, encryption of inter-container traffic, security group IDs, and subnets.

Method `get_run_args()`

This object contains the normalized inputs, outputs and arguments needed when using a “SparkJarProcessor“ in a :class:'~sagemaker.workflow.steps.ProcessingStep'.

Usage

SparkJarProcessor$get_run_args(
  submit_app,
  submit_class = NULL,
  submit_jars = NULL,
  submit_files = NULL,
  inputs = NULL,
  outputs = NULL,
  arguments = NULL,
  job_name = NULL,
  configuration = NULL,
  spark_event_logs_s3_uri = NULL
)

Arguments

submit_app: (str): Path (local or S3) to Python file to submit to Spark as the primary application. This is translated to the 'code' property on the returned 'RunArgs' object
submit_class: (str): Java class reference to submit to Spark as the primary application
submit_jars: (list[str]): List of paths (local or S3) to provide for 'spark-submit –jars' option
submit_files: (list[str]): List of paths (local or S3) to provide for 'spark-submit –files' option
inputs: (list[:class:'~sagemaker.processing.ProcessingInput']): Input files for the processing job. These must be provided as :class:'~sagemaker.processing.ProcessingInput' objects (default: None).
outputs: (list[:class:'~sagemaker.processing.ProcessingOutput']): Outputs for the processing job. These can be specified as either path strings or :class:'~sagemaker.processing.ProcessingOutput' objects (default: None).
arguments: (list[str]): A list of string arguments to be passed to a processing job (default: None).
job_name: (str): Processing job name. If not specified, the processor generates a default job name, based on the base job name and current timestamp.
configuration: (list[dict] or dict): Configuration for Hadoop, Spark, or Hive. List or dictionary of EMR-style classifications. https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-configure-apps.html
spark_event_logs_s3_uri: (str): S3 path where spark application events will be published to.

Returns

Returns a RunArgs object.

Method `run()`

Runs a processing job.

Usage

SparkJarProcessor$run(
  submit_app,
  submit_class = NULL,
  submit_jars = NULL,
  submit_files = NULL,
  inputs = NULL,
  outputs = NULL,
  arguments = NULL,
  wait = TRUE,
  logs = TRUE,
  job_name = NULL,
  experiment_config = NULL,
  configuration = NULL,
  spark_event_logs_s3_uri = NULL,
  kms_key = NULL
)

Arguments

submit_app: (str): Path (local or S3) to Jar file to submit to Spark as the primary application
submit_class: (str): Java class reference to submit to Spark as the primary application
submit_jars: (list[str]): List of paths (local or S3) to provide for 'spark-submit –jars' option
submit_files: (list[str]): List of paths (local or S3) to provide for 'spark-submit –files' option
inputs: (list[:class:'~sagemaker.processing.ProcessingInput']): Input files for the processing job. These must be provided as :class:'~sagemaker.processing.ProcessingInput' objects (default: None).
outputs: (list[:class:'~sagemaker.processing.ProcessingOutput']): Outputs for the processing job. These can be specified as either path strings or :class:'~sagemaker.processing.ProcessingOutput' objects (default: None).
arguments: (list[str]): A list of string arguments to be passed to a processing job (default: None).
wait: (bool): Whether the call should wait until the job completes (default: True).
logs: (bool): Whether to show the logs produced by the job. Only meaningful when wait is True (default: True).
job_name: (str): Processing job name. If not specified, the processor generates a default job name, based on the base job name and current timestamp.
experiment_config: (dict[str, str]): Experiment management configuration. Dictionary contais three optional keys: 'ExperimentName', 'TrialName', and 'TrialComponentDisplayName'.
configuration: (list[dict] or dict): Configuration for Hadoop, Spark, or Hive. List or dictionary of EMR-style classifications. https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-configure-apps.html
spark_event_logs_s3_uri: (str): S3 path where spark application events will be published to.
kms_key: (str): The ARN of the KMS key that is used to encrypt the user code file (default: None).

Method `clone()`

The objects of this class are cloneable with this method.

Usage

SparkJarProcessor$clone(deep = FALSE)

Arguments

deep: Whether to make a deep clone.

DyfanJones/sagemaker-r-mlframework documentation built on March 18, 2022, 7:41 a.m.

DyfanJones/sagemaker-r-mlframework index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

DyfanJones/sagemaker-r-mlframework
sagemaker machine learning developed by amazon

SparkJarProcessor: SparkJarProcessor Class
In DyfanJones/sagemaker-r-mlframework: sagemaker machine learning developed by amazon

SparkJarProcessor Class

Description

Super classes

Methods

Public methods

Method `new()`

Usage

Arguments

Method `get_run_args()`

Usage

Arguments

Returns

Method `run()`

Usage

Arguments

Method `clone()`

Usage

Arguments

Related to SparkJarProcessor in DyfanJones/sagemaker-r-mlframework...

R Package Documentation

Browse R Packages

We want your feedback!

DyfanJones/sagemaker-r-mlframework sagemaker machine learning developed by amazon

SparkJarProcessor: SparkJarProcessor Class In DyfanJones/sagemaker-r-mlframework: sagemaker machine learning developed by amazon

SparkJarProcessor Class

Description

Super classes

Methods

Public methods

Method new()

Usage

Arguments

Method get_run_args()

Usage

Arguments

Returns

Method run()

Usage

Arguments

Method clone()

Usage

Arguments

Related to SparkJarProcessor in DyfanJones/sagemaker-r-mlframework...

R Package Documentation

Browse R Packages

We want your feedback!

DyfanJones/sagemaker-r-mlframework
sagemaker machine learning developed by amazon

SparkJarProcessor: SparkJarProcessor Class
In DyfanJones/sagemaker-r-mlframework: sagemaker machine learning developed by amazon

Method `new()`

Method `get_run_args()`

Method `run()`

Method `clone()`