GenerateDatetimePartition: Preview the full partitioning determined by a...

View source: R/Partitions.R

GenerateDatetimePartitionR Documentation

Preview the full partitioning determined by a DatetimePartitioningSpecification

Description

Based on the project dataset and the partitioning specification, inspect the full partitioning that would be used if the same specification were passed into SetTarget. This is not intended to be passed to SetTarget.

Usage

GenerateDatetimePartition(project, spec)

Arguments

project

character. Either (1) a character string giving the unique alphanumeric identifier for the project, or (2) a list containing the element projectId with this identifier.

spec

list. Datetime partition specification returned by CreateDatetimePartitionSpecification

Value

list describing datetime partition with following components

  • cvMethod. The type of validation scheme used for the project.

  • projectId character. The id of the project this partitioning applies to.

  • datetimePartitionColumn character. The name of the column whose values as dates are used to assign a row to a particular partition.

  • dateFormat character. The format (e.g. " partition column was interpreted (compatible with strftime [https://docs.python.org/2/library/time.html#time.strftime]).

  • autopilotDataSelectionMethod character. Whether models created by the autopilot use "rowCount" or "duration" as their dataSelectionMethod.

  • validationDuration character. The validation duration specified when initializing the partitioning - not directly significant if the backtests have been modified, but used as the default validationDuration for the backtests.

  • availableTrainingStartDate character. The start date of the available training data for scoring the holdout.

  • availableTrainingDuration character. The duration of the available training data for scoring the holdout.

  • availableTrainingRowCount integer. The number of rows in the available training data for scoring the holdout. Only available when retrieving the partitioning after setting the target.

  • availableTrainingEndDate character. The end date of the available training data for scoring the holdout.

  • primaryTrainingStartDate character. The start date of primary training data for scoring the holdout.

  • primaryTrainingDuration character. The duration of the primary training data for scoring the holdout.

  • primaryTrainingRowCount integer. The number of rows in the primary training data for scoring the holdout. Only available when retrieving the partitioning after setting the target.

  • primaryTrainingEndDate character. The end date of the primary training data for scoring the holdout.

  • gapStartDate character. The start date of the gap between training and holdout scoring data.

  • gapDuration character. The duration of the gap between training and holdout scoring data.

  • gapRowCount integer. The number of rows in the gap between training and holdout scoring data. Only available when retrieving the partitioning after setting the target.

  • gapEndDate character. The end date of the gap between training and holdout scoring data.

  • holdoutStartDate character. The start date of holdout scoring data.

  • holdoutDuration character. The duration of the holdout scoring data.

  • holdoutRowCount integer. The number of rows in the holdout scoring data. Only available when retrieving the partitioning after setting the target.

  • holdoutEndDate character. The end date of the holdout scoring data.

  • numberOfBacktests integer. the number of backtests used.

  • backtests data.frame. A data frame of partition backtest. Each element represent one backtest and has the following components: index, availableTrainingStartDate, availableTrainingDuration, availableTrainingRowCount, availableTrainingEndDate, primaryTrainingStartDate, primaryTrainingDuration, primaryTrainingRowCount, primaryTrainingEndDate, gapStartDate, gapDuration, gapRowCount, gapEndDate, validationStartDate, validationDuration, validationRowCount, validationEndDate, totalRowCount.

  • useTimeSeries logical. Whether the project is a time series project (if TRUE) or an OTV project which uses datetime partitioning (if FALSE).

  • defaultToKnownInAdvance logical. Whether the project defaults to treating features as known in advance. Known in advance features are time series features that are expected to be known for dates in the future when making predictions (e.g., "is this a holiday").

  • featureDerivationWindowStart integer. Offset into the past to define how far back relative to the forecast point the feature derivation window should start. Only used for time series projects. Expressed in terms of the timeUnit of the datetimePartitionColumn.

  • featureDerivationWindowEnd integer. Offset into the past to define how far back relative to the forecast point the feature derivation window should end. Only used for time series projects. Expressed in terms of the timeUnit of the datetimePartitionColumn.

  • forecastWindowStart integer. Offset into the future to define how far forward relative to the forecast point the forecast window should start. Only used for time series projects. Expressed in terms of the timeUnit of the datetimePartitionColumn.

  • forecastWindowEnd integer. Offset into the future to define how far forward relative to the forecast point the forecast window should end. Only used for time series projects. Expressed in terms of the timeUnit of the datetimePartitionColumn.

  • featureSettings list. A list of lists specifying settings for each feature. For each feature you would like to set feature settings for, pass the following in a list:

    • featureName character. The name of the feature to set feature settings.

    • knownInAdvance logical. Optional. Whether or not the feature is known in advance. Used for time series only. Defaults to FALSE.

    • doNotDerive logical. Optional. If TRUE, no time series derived features (e.g., lags) will be automatically engineered from this feature. Used for time series only. Defaults to FALSE.

  • treatAsExponential character. Specifies whether to treat data as exponential trend and apply transformations like log-transform. Uses values from from TreatAsExponential.

  • differencingMethod character. Used to specify differencing method to apply if data is stationary. Use values from DifferencingMethod.

  • windowsBasisUnit character. Indicates which unit is the basis for the feature derivation window and forecast window. Uses values from TimeUnit and the value "ROW".

  • periodicities list. A list of periodicities for different times, specified as a list of lists, where each list item specifies the 'timeSteps' for a particular 'timeUnit'. Will be "ROW" if windowsBasisUnit is "ROW".

  • totalRowCount integer. The number of rows in the project dataset. Only available when retrieving the partitioning after setting the target. Thus it will be NULL for GenerateDatetimePartition and populated for GetDatetimePartition.

  • validationRowCount integer. The number of rows in the validation set.

  • multiseriesIdColumns list. A list of the names of multiseries id columns to define series.

  • numberOfKnownInAdvanceFeatures integer. The number of known in advance features.

  • useCrossSeriesFeatures logical. Whether or not cross series features are included.

  • aggregationType character. The aggregation type to apply when creating cross series features. See SeriesAggregationType.

  • calendarId character. The ID of the calendar used for this project, if any.

Examples

## Not run: 
  projectId <- "59a5af20c80891534e3c2bde"
  partitionSpec <- CreateDatetimePartitionSpecification("date_col")
  GenerateDatetimePartition(projectId, partitionSpec)

## End(Not run)

datarobot documentation built on May 29, 2024, 4:36 a.m.