DataConfig: DataConfig Class

DataConfigR Documentation

DataConfig Class

Description

Config object related to configurations of the input and output dataset.

Public fields

s3_data_input_path

Dataset S3 prefix/object URI.

s3_output_path

S3 prefix to store the output.

s3_analysis_config_output_path

S3 prefix to store the analysis_config output.

s3_data_distribution_type

Valid options are "FullyReplicated" or "ShardedByS3Key".

s3_compression_type

Valid options are "None" or "Gzip".

label

Target attribute of the model required by bias metrics

headers

A list of column names in the input dataset.

features

JSONPath for locating the feature columns

analysis_config

Analysis config dictionary

Methods

Public methods


Method new()

Initializes a configuration of both input and output datasets.

Usage
DataConfig$new(
  s3_data_input_path,
  s3_output_path,
  s3_analysis_config_output_path = NULL,
  label = NULL,
  headers = NULL,
  features = NULL,
  dataset_type = c("text/csv", "application/jsonlines", "application/x-parquet",
    "application/x-image"),
  s3_data_distribution_type = "FullyReplicated",
  s3_compression_type = c("None", "Gzip"),
  joinsource = NULL
)
Arguments
s3_data_input_path

(str): Dataset S3 prefix/object URI.

s3_output_path

(str): S3 prefix to store the output.

s3_analysis_config_output_path

(str): S3 prefix to store the analysis_config output If this field is None, then the s3_output_path will be used to store the analysis_config output

label

(str): Target attribute of the model required by bias metrics (optional for SHAP) Specified as column name or index for CSV dataset, or as JSONPath for JSONLines.

headers

(list[str]): A list of column names in the input dataset.

features

(str): JSONPath for locating the feature columns for bias metrics if the dataset format is JSONLines.

dataset_type

(str): Format of the dataset. Valid values are "text/csv" for CSV and "application/jsonlines" for JSONLines.

s3_data_distribution_type

(str): Valid options are "FullyReplicated" or "ShardedByS3Key".

s3_compression_type

(str): Valid options are "None" or "Gzip".

joinsource

(str): The name or index of the column in the dataset that acts as an identifier column (for instance, while performing a join). This column is only used as an identifier, and not used for any other computations. This is an optional field in all cases except when the dataset contains more than one file, and 'save_local_shap_values' is set to true in SHAPConfig.


Method get_config()

Returns part of an analysis config dictionary.

Usage
DataConfig$get_config()

Method format()

format class

Usage
DataConfig$format()

Method clone()

The objects of this class are cloneable with this method.

Usage
DataConfig$clone(deep = FALSE)
Arguments
deep

Whether to make a deep clone.


DyfanJones/sagemaker-r-common documentation built on June 14, 2022, 10:31 p.m.