CreateDatetimePartitionSpecification: Create a list describing datetime partition parameters

View source: R/Partitions.R

CreateDatetimePartitionSpecificationR Documentation

Create a list describing datetime partition parameters

Description

Uniquely defines a DatetimePartitioning for some project

Usage

CreateDatetimePartitionSpecification(
  datetimePartitionColumn,
  autopilotDataSelectionMethod = NULL,
  validationDuration = NULL,
  holdoutStartDate = NULL,
  holdoutDuration = NULL,
  disableHoldout = NULL,
  gapDuration = NULL,
  numberOfBacktests = NULL,
  backtests = NULL,
  useTimeSeries = FALSE,
  defaultToKnownInAdvance = FALSE,
  featureDerivationWindowStart = NULL,
  featureDerivationWindowEnd = NULL,
  featureSettings = NULL,
  treatAsExponential = NULL,
  differencingMethod = NULL,
  windowsBasisUnit = NULL,
  periodicities = NULL,
  forecastWindowStart = NULL,
  forecastWindowEnd = NULL,
  multiseriesIdColumns = NULL,
  useCrossSeries = NULL,
  aggregationType = NULL,
  crossSeriesGroupByColumns = NULL,
  calendar = NULL
)

Arguments

datetimePartitionColumn

character. The name of the column whose values as dates are used to assign a row to a particular partition

autopilotDataSelectionMethod

character. Optional. Whether models created by the autopilot should use "rowCount" or "duration" as their dataSelectionMethod

validationDuration

character. Optional. The default validationDuration for the backtests

holdoutStartDate

character. The start date of holdout scoring data (RFC 3339 format). If holdoutStartDate is specified, holdoutDuration must also be specified.

holdoutDuration

character. Optional. The duration of the holdout scoring data. If holdoutDuration is specified, holdoutStartDate must also be specified.

disableHoldout

logical. Optional. Whether to suppress allocating the holdout fold. If set to TRUE, holdoutStartDate and holdoutDuration must not be specified.

gapDuration

character. Optional. The duration of the gap between training and holdout scoring data.

numberOfBacktests

integer. The number of backtests to use.

backtests

list. List of BacktestSpecification the exact specification of backtests to use. The indexes of the specified backtests should range from 0 to numberOfBacktests - 1. If any backtest is left unspecified, a default configuration will be chosen.

useTimeSeries

logical. Whether to create a time series project (if TRUE) or an OTV project which uses datetime partitioning (if FALSE). The default behavior is to create an OTV project.

defaultToKnownInAdvance

logical. Whether to default to treating features as known in advance. Defaults to FALSE. Only used for time series project. Known in advance features are expected to be known for dates in the future when making predictions (e.g., "is this a holiday").

featureDerivationWindowStart

integer. Optional. Offset into the past to define how far back relative to the forecast point the feature derivation window should start. Only used for time series projects. Expressed in terms of the timeUnit of the datetimePartitionColumn.

featureDerivationWindowEnd

integer. Optional. Offset into the past to define how far back relative to the forecast point the feature derivation window should end. Only used for time series projects. Expressed in terms of the timeUnit of the datetimePartitionColumn.

featureSettings

list. Optional. A list specifying settings for each feature. For each feature you would like to set feature settings for, pass the following in a list:

  • featureName character. The name of the feature to set feature settings.

  • knownInAdvance logical. Optional. Whether or not the feature is known in advance. Used for time series only. Defaults to FALSE.

  • doNotDerive logical. Optional. If TRUE, no time series derived features (e.g., lags) will be automatically engineered from this feature. Used for time series only. Defaults to FALSE.

treatAsExponential

character. Optional. Defaults to "auto". Used to specify whether to treat data as exponential trend and apply transformations like log-transform. Use values from TreatAsExponential enum.

differencingMethod

character. Optional. Defaults to "auto". Used to specify differencing method to apply if data is stationary. Use values from DifferencingMethod.

windowsBasisUnit

character. Optional. Indicates which unit is the basis for the feature derivation window and forecast window. Valid options are a time unit (see TimeUnit) or "ROW".

periodicities

list. Optional. A list of periodicities for different times. Must be specified as a list of lists, where each list item specifies the 'timeSteps' for a particular 'timeUnit'. Should be "ROW" if windowsBasisUnit is "ROW".

forecastWindowStart

integer. Optional. Offset into the future to define how far forward relative to the forecast point the forecast window should start. Only used for time series projects. Expressed in terms of the timeUnit of the datetimePartitionColumn.

forecastWindowEnd

integer. Optional. Offset into the future to define how far forward relative to the forecast point the forecast window should end. Only used for time series projects. Expressed in terms of the timeUnit of the datetimePartitionColumn.

multiseriesIdColumns

list. A list of the names of multiseries id columns to define series

useCrossSeries

logical. If TRUE, cross series features will be included. For details, see "Calculating features across series" in the time series section of the DataRobot user guide.

aggregationType

character. Optional. The aggregation type to apply when creating cross series features. Must be either "total" or "average". See SeriesAggregationType.

crossSeriesGroupByColumns

character. Optional. Column to split a cross series into further groups. For example, if every series is sales of an individual product, the cross series group could be e product category with values like "men's clothing", "sports equipment", etc. Requires multiseries with useCrossSeries enabled.

calendar

character. Optional. Either the calendar object or calendar id to use for this project.

Details

Includes only the attributes of DatetimePartitioning that are directly controllable by users, not those determined by the DataRobot application based on the project dataset and the user-controlled settings. This is the specification that should be passed to SetTarget via the partition parameter. To see the full partitioning based on the project dataset, GenerateDatetimePartition. All durations should be specified with a duration string such as those returned by the ConstructDurationString helper function.

Value

An S3 object of class 'partition' including the parameters required by the SetTarget function to generate a datetime partitioning of the modeling dataset.

Examples

CreateDatetimePartitionSpecification("date_col")
CreateDatetimePartitionSpecification("date",
                                     featureSettings = list(
                                       list("featureName" = "Product_offers",
                                            "defaultToKnownInAdvance" = TRUE)))
partition <- CreateDatetimePartitionSpecification("dateColumn",
                                                treatAsExponential = TreatAsExponential$Always,
                                                differencingMethod = DifferencingMethod$Seasonal,
                                                periodicities = list(list("timeSteps" = 10,
                                                                          "timeUnit" = "HOUR"),
                                                                     list("timeSteps" = 600,
                                                                          "timeUnit" = "MINUTE"),
                                                                     list("timeSteps" = 7,
                                                                          "timeUnit" = "DAY")))

datarobot documentation built on May 29, 2024, 4:36 a.m.