glue: AWS Glue

View source: R/paws.R

glueR Documentation

AWS Glue

Description

Glue

Defines the public endpoint for the Glue service.

Usage

glue(config = list(), credentials = list(), endpoint = NULL, region = NULL)

Arguments

config

Optional configuration of credentials, endpoint, and/or region.

  • credentials:

    • creds:

      • access_key_id: AWS access key ID

      • secret_access_key: AWS secret access key

      • session_token: AWS temporary session token

    • profile: The name of a profile to use. If not given, then the default profile is used.

    • anonymous: Set anonymous credentials.

    • endpoint: The complete URL to use for the constructed client.

    • region: The AWS Region used in instantiating the client.

  • close_connection: Immediately close all HTTP connections.

  • timeout: The time in seconds till a timeout exception is thrown when attempting to make a connection. The default is 60 seconds.

  • s3_force_path_style: Set this to true to force the request to use path-style addressing, i.e. ⁠http://s3.amazonaws.com/BUCKET/KEY⁠.

  • sts_regional_endpoint: Set sts regional endpoint resolver to regional or legacy https://docs.aws.amazon.com/sdkref/latest/guide/feature-sts-regionalized-endpoints.html

credentials

Optional credentials shorthand for the config parameter

  • creds:

    • access_key_id: AWS access key ID

    • secret_access_key: AWS secret access key

    • session_token: AWS temporary session token

  • profile: The name of a profile to use. If not given, then the default profile is used.

  • anonymous: Set anonymous credentials.

endpoint

Optional shorthand for complete URL to use for the constructed client.

region

Optional shorthand for AWS Region used in instantiating the client.

Value

A client for the service. You can call the service's operations using syntax like svc$operation(...), where svc is the name you've assigned to the client. The available operations are listed in the Operations section.

Service syntax

svc <- glue(
  config = list(
    credentials = list(
      creds = list(
        access_key_id = "string",
        secret_access_key = "string",
        session_token = "string"
      ),
      profile = "string",
      anonymous = "logical"
    ),
    endpoint = "string",
    region = "string",
    close_connection = "logical",
    timeout = "numeric",
    s3_force_path_style = "logical",
    sts_regional_endpoint = "string"
  ),
  credentials = list(
    creds = list(
      access_key_id = "string",
      secret_access_key = "string",
      session_token = "string"
    ),
    profile = "string",
    anonymous = "logical"
  ),
  endpoint = "string",
  region = "string"
)

Operations

batch_create_partition Creates one or more partitions in a batch operation
batch_delete_connection Deletes a list of connection definitions from the Data Catalog
batch_delete_partition Deletes one or more partitions in a batch operation
batch_delete_table Deletes multiple tables at once
batch_delete_table_version Deletes a specified batch of versions of a table
batch_get_blueprints Retrieves information about a list of blueprints
batch_get_crawlers Returns a list of resource metadata for a given list of crawler names
batch_get_custom_entity_types Retrieves the details for the custom patterns specified by a list of names
batch_get_data_quality_result Retrieves a list of data quality results for the specified result IDs
batch_get_dev_endpoints Returns a list of resource metadata for a given list of development endpoint names
batch_get_jobs Returns a list of resource metadata for a given list of job names
batch_get_partition Retrieves partitions in a batch request
batch_get_triggers Returns a list of resource metadata for a given list of trigger names
batch_get_workflows Returns a list of resource metadata for a given list of workflow names
batch_stop_job_run Stops one or more job runs for a specified job definition
batch_update_partition Updates one or more partitions in a batch operation
cancel_data_quality_rule_recommendation_run Cancels the specified recommendation run that was being used to generate rules
cancel_data_quality_ruleset_evaluation_run Cancels a run where a ruleset is being evaluated against a data source
cancel_ml_task_run Cancels (stops) a task run
cancel_statement Cancels the statement
check_schema_version_validity Validates the supplied schema
create_blueprint Registers a blueprint with Glue
create_classifier Creates a classifier in the user's account
create_connection Creates a connection definition in the Data Catalog
create_crawler Creates a new crawler with specified targets, role, configuration, and optional schedule
create_custom_entity_type Creates a custom pattern that is used to detect sensitive data across the columns and rows of your structured data
create_database Creates a new database in a Data Catalog
create_data_quality_ruleset Creates a data quality ruleset with DQDL rules applied to a specified Glue table
create_dev_endpoint Creates a new development endpoint
create_job Creates a new job definition
create_ml_transform Creates an Glue machine learning transform
create_partition Creates a new partition
create_partition_index Creates a specified partition index in an existing table
create_registry Creates a new registry which may be used to hold a collection of schemas
create_schema Creates a new schema set and registers the schema definition
create_script Transforms a directed acyclic graph (DAG) into code
create_security_configuration Creates a new security configuration
create_session Creates a new session
create_table Creates a new table definition in the Data Catalog
create_trigger Creates a new trigger
create_user_defined_function Creates a new function definition in the Data Catalog
create_workflow Creates a new workflow
delete_blueprint Deletes an existing blueprint
delete_classifier Removes a classifier from the Data Catalog
delete_column_statistics_for_partition Delete the partition column statistics of a column
delete_column_statistics_for_table Retrieves table statistics of columns
delete_connection Deletes a connection from the Data Catalog
delete_crawler Removes a specified crawler from the Glue Data Catalog, unless the crawler state is RUNNING
delete_custom_entity_type Deletes a custom pattern by specifying its name
delete_database Removes a specified database from a Data Catalog
delete_data_quality_ruleset Deletes a data quality ruleset
delete_dev_endpoint Deletes a specified development endpoint
delete_job Deletes a specified job definition
delete_ml_transform Deletes an Glue machine learning transform
delete_partition Deletes a specified partition
delete_partition_index Deletes a specified partition index from an existing table
delete_registry Delete the entire registry including schema and all of its versions
delete_resource_policy Deletes a specified policy
delete_schema Deletes the entire schema set, including the schema set and all of its versions
delete_schema_versions Remove versions from the specified schema
delete_security_configuration Deletes a specified security configuration
delete_session Deletes the session
delete_table Removes a table definition from the Data Catalog
delete_table_version Deletes a specified version of a table
delete_trigger Deletes a specified trigger
delete_user_defined_function Deletes an existing function definition from the Data Catalog
delete_workflow Deletes a workflow
get_blueprint Retrieves the details of a blueprint
get_blueprint_run Retrieves the details of a blueprint run
get_blueprint_runs Retrieves the details of blueprint runs for a specified blueprint
get_catalog_import_status Retrieves the status of a migration operation
get_classifier Retrieve a classifier by name
get_classifiers Lists all classifier objects in the Data Catalog
get_column_statistics_for_partition Retrieves partition statistics of columns
get_column_statistics_for_table Retrieves table statistics of columns
get_connection Retrieves a connection definition from the Data Catalog
get_connections Retrieves a list of connection definitions from the Data Catalog
get_crawler Retrieves metadata for a specified crawler
get_crawler_metrics Retrieves metrics about specified crawlers
get_crawlers Retrieves metadata for all crawlers defined in the customer account
get_custom_entity_type Retrieves the details of a custom pattern by specifying its name
get_database Retrieves the definition of a specified database
get_databases Retrieves all databases defined in a given Data Catalog
get_data_catalog_encryption_settings Retrieves the security configuration for a specified catalog
get_dataflow_graph Transforms a Python script into a directed acyclic graph (DAG)
get_data_quality_result Retrieves the result of a data quality rule evaluation
get_data_quality_rule_recommendation_run Gets the specified recommendation run that was used to generate rules
get_data_quality_ruleset Returns an existing ruleset by identifier or name
get_data_quality_ruleset_evaluation_run Retrieves a specific run where a ruleset is evaluated against a data source
get_dev_endpoint Retrieves information about a specified development endpoint
get_dev_endpoints Retrieves all the development endpoints in this Amazon Web Services account
get_job Retrieves an existing job definition
get_job_bookmark Returns information on a job bookmark entry
get_job_run Retrieves the metadata for a given job run
get_job_runs Retrieves metadata for all runs of a given job definition
get_jobs Retrieves all current job definitions
get_mapping Creates mappings
get_ml_task_run Gets details for a specific task run on a machine learning transform
get_ml_task_runs Gets a list of runs for a machine learning transform
get_ml_transform Gets an Glue machine learning transform artifact and all its corresponding metadata
get_ml_transforms Gets a sortable, filterable list of existing Glue machine learning transforms
get_partition Retrieves information about a specified partition
get_partition_indexes Retrieves the partition indexes associated with a table
get_partitions Retrieves information about the partitions in a table
get_plan Gets code to perform a specified mapping
get_registry Describes the specified registry in detail
get_resource_policies Retrieves the resource policies set on individual resources by Resource Access Manager during cross-account permission grants
get_resource_policy Retrieves a specified resource policy
get_schema Describes the specified schema in detail
get_schema_by_definition Retrieves a schema by the SchemaDefinition
get_schema_version Get the specified schema by its unique ID assigned when a version of the schema is created or registered
get_schema_versions_diff Fetches the schema version difference in the specified difference type between two stored schema versions in the Schema Registry
get_security_configuration Retrieves a specified security configuration
get_security_configurations Retrieves a list of all security configurations
get_session Retrieves the session
get_statement Retrieves the statement
get_table Retrieves the Table definition in a Data Catalog for a specified table
get_tables Retrieves the definitions of some or all of the tables in a given Database
get_table_version Retrieves a specified version of a table
get_table_versions Retrieves a list of strings that identify available versions of a specified table
get_tags Retrieves a list of tags associated with a resource
get_trigger Retrieves the definition of a trigger
get_triggers Gets all the triggers associated with a job
get_unfiltered_partition_metadata Retrieves partition metadata from the Data Catalog that contains unfiltered metadata
get_unfiltered_partitions_metadata Retrieves partition metadata from the Data Catalog that contains unfiltered metadata
get_unfiltered_table_metadata Retrieves table metadata from the Data Catalog that contains unfiltered metadata
get_user_defined_function Retrieves a specified function definition from the Data Catalog
get_user_defined_functions Retrieves multiple function definitions from the Data Catalog
get_workflow Retrieves resource metadata for a workflow
get_workflow_run Retrieves the metadata for a given workflow run
get_workflow_run_properties Retrieves the workflow run properties which were set during the run
get_workflow_runs Retrieves metadata for all runs of a given workflow
import_catalog_to_glue Imports an existing Amazon Athena Data Catalog to Glue
list_blueprints Lists all the blueprint names in an account
list_crawlers Retrieves the names of all crawler resources in this Amazon Web Services account, or the resources with the specified tag
list_crawls Returns all the crawls of a specified crawler
list_custom_entity_types Lists all the custom patterns that have been created
list_data_quality_results Returns all data quality execution results for your account
list_data_quality_rule_recommendation_runs Lists the recommendation runs meeting the filter criteria
list_data_quality_ruleset_evaluation_runs Lists all the runs meeting the filter criteria, where a ruleset is evaluated against a data source
list_data_quality_rulesets Returns a paginated list of rulesets for the specified list of Glue tables
list_dev_endpoints Retrieves the names of all DevEndpoint resources in this Amazon Web Services account, or the resources with the specified tag
list_jobs Retrieves the names of all job resources in this Amazon Web Services account, or the resources with the specified tag
list_ml_transforms Retrieves a sortable, filterable list of existing Glue machine learning transforms in this Amazon Web Services account, or the resources with the specified tag
list_registries Returns a list of registries that you have created, with minimal registry information
list_schemas Returns a list of schemas with minimal details
list_schema_versions Returns a list of schema versions that you have created, with minimal information
list_sessions Retrieve a list of sessions
list_statements Lists statements for the session
list_triggers Retrieves the names of all trigger resources in this Amazon Web Services account, or the resources with the specified tag
list_workflows Lists names of workflows created in the account
put_data_catalog_encryption_settings Sets the security configuration for a specified catalog
put_resource_policy Sets the Data Catalog resource policy for access control
put_schema_version_metadata Puts the metadata key value pair for a specified schema version ID
put_workflow_run_properties Puts the specified workflow run properties for the given workflow run
query_schema_version_metadata Queries for the schema version metadata information
register_schema_version Adds a new version to the existing schema
remove_schema_version_metadata Removes a key value pair from the schema version metadata for the specified schema version ID
reset_job_bookmark Resets a bookmark entry
resume_workflow_run Restarts selected nodes of a previous partially completed workflow run and resumes the workflow run
run_statement Executes the statement
search_tables Searches a set of tables based on properties in the table metadata as well as on the parent database
start_blueprint_run Starts a new run of the specified blueprint
start_crawler Starts a crawl using the specified crawler, regardless of what is scheduled
start_crawler_schedule Changes the schedule state of the specified crawler to SCHEDULED, unless the crawler is already running or the schedule state is already SCHEDULED
start_data_quality_rule_recommendation_run Starts a recommendation run that is used to generate rules when you don't know what rules to write
start_data_quality_ruleset_evaluation_run Once you have a ruleset definition (either recommended or your own), you call this operation to evaluate the ruleset against a data source (Glue table)
start_export_labels_task_run Begins an asynchronous task to export all labeled data for a particular transform
start_import_labels_task_run Enables you to provide additional labels (examples of truth) to be used to teach the machine learning transform and improve its quality
start_job_run Starts a job run using a job definition
start_ml_evaluation_task_run Starts a task to estimate the quality of the transform
start_ml_labeling_set_generation_task_run Starts the active learning workflow for your machine learning transform to improve the transform's quality by generating label sets and adding labels
start_trigger Starts an existing trigger
start_workflow_run Starts a new run of the specified workflow
stop_crawler If the specified crawler is running, stops the crawl
stop_crawler_schedule Sets the schedule state of the specified crawler to NOT_SCHEDULED, but does not stop the crawler if it is already running
stop_session Stops the session
stop_trigger Stops a specified trigger
stop_workflow_run Stops the execution of the specified workflow run
tag_resource Adds tags to a resource
untag_resource Removes tags from a resource
update_blueprint Updates a registered blueprint
update_classifier Modifies an existing classifier (a GrokClassifier, an XMLClassifier, a JsonClassifier, or a CsvClassifier, depending on which field is present)
update_column_statistics_for_partition Creates or updates partition statistics of columns
update_column_statistics_for_table Creates or updates table statistics of columns
update_connection Updates a connection definition in the Data Catalog
update_crawler Updates a crawler
update_crawler_schedule Updates the schedule of a crawler using a cron expression
update_database Updates an existing database definition in a Data Catalog
update_data_quality_ruleset Updates the specified data quality ruleset
update_dev_endpoint Updates a specified development endpoint
update_job Updates an existing job definition
update_job_from_source_control Synchronizes a job from the source control repository
update_ml_transform Updates an existing machine learning transform
update_partition Updates a partition
update_registry Updates an existing registry which is used to hold a collection of schemas
update_schema Updates the description, compatibility setting, or version checkpoint for a schema set
update_source_control_from_job Synchronizes a job to the source control repository
update_table Updates a metadata table in the Data Catalog
update_trigger Updates a trigger definition
update_user_defined_function Updates an existing function definition in the Data Catalog
update_workflow Updates an existing workflow

Examples

## Not run: 
svc <- glue()
svc$batch_create_partition(
  Foo = 123
)

## End(Not run)


paws documentation built on Sept. 15, 2023, 5:06 p.m.

Related to glue in paws...