Nothing
# Copyright(c) Microsoft Corporation.
# Licensed under the MIT license.
#' Create a deployment config for deploying an AKS web service
#' @description
#' Deploy a web service to Azure Kubernetes Service for high-scale
#' prodution deployments. Provides fast response time and autoscaling
#' of the deployed service. Using GPU for inference when deployed as a
#' web service is only supported on AKS.
#'
#' Deploy to AKS if you need one or more of the following capabilities:
#' * Fast response time
#' * Autoscaling of the deployed service
#' * Hardware acceleration options
#' @param autoscale_enabled If `TRUE` enable autoscaling for the web service.
#' Defaults to `TRUE` if `num_replicas = NULL`.
#' @param autoscale_min_replicas An int of the minimum number of containers
#' to use when autoscaling the web service. Defaults to `1`.
#' @param autoscale_max_replicas An int of the maximum number of containers
#' to use when autoscaling the web service. Defaults to `10`.
#' @param autoscale_refresh_seconds An int of how often in seconds the
#' autoscaler should attempt to scale the web service. Defaults to `1`.
#' @param autoscale_target_utilization An int of the target utilization
#' (in percent out of 100) the autoscaler should attempt to maintain for
#' the web service. Defaults to `70`.
#' @param auth_enabled If `TRUE` enable key-based authentication for the
#' web service. Defaults to `TRUE`.
#' @param cpu_cores The number of cpu cores to allocate for
#' the web service. Can be a decimal. Defaults to `0.1`.
#' @param memory_gb The amount of memory (in GB) to allocate for
#' the web service. Can be a decimal. Defaults to `0.5`.
#' @param enable_app_insights If `TRUE` enable AppInsights for the web service.
#' Defaults to `FALSE`.
#' @param scoring_timeout_ms An int of the timeout (in milliseconds) to
#' enforce for scoring calls to the web service. Defaults to `60000`.
#' @param replica_max_concurrent_requests An int of the number of maximum
#' concurrent requests per node to allow for the web service. Defaults to `1`.
#' @param max_request_wait_time An int of the maximum amount of time a request
#' will stay in the queue (in milliseconds) before returning a 503 error.
#' Defaults to `500`.
#' @param num_replicas An int of the number of containers to allocate for the
#' web service. If this parameter is not set then the autoscaler is enabled by
#' default.
#' @param primary_key A string of the primary auth key to use for the web service.
#' @param secondary_key A string of the secondary auth key to use for the web
#' service.
#' @param tags A named list of key-value tags for the web service,
#' e.g. `list("key" = "value")`.
#' @param properties A named list of key-value properties for the web
#' service, e.g. `list("key" = "value")`. These properties cannot
#' be changed after deployment, but new key-value pairs can be added.
#' @param description A string of the description to give the web service
#' @param gpu_cores An int of the number of gpu cores to allocate for the
#' web service. Defaults to `1`.
#' @param period_seconds An int of how often in seconds to perform the
#' liveness probe. Default to `10`. Minimum value is `1`.
#' @param initial_delay_seconds An int of the number of seconds after
#' the container has started before liveness probes are initiated.
#' Defaults to `310`.
#' @param timeout_seconds An int of the number of seconds after which the
#' liveness probe times out. Defaults to `2`. Minimum value is `1`.
#' @param success_threshold An int of the minimum consecutive successes
#' for the liveness probe to be considered successful after having failed.
#' Defaults to `1`. Minimum value is `1`.
#' @param failure_threshold An int of the number of times Kubernetes will try
#' the liveness probe when a Pod starts and the probe fails, before giving up.
#' Defaults to `3`. Minimum value is `1`.
#' @param namespace A string of the Kubernetes namespace in which to deploy the web service:
#' up to 63 lowercase alphanumeric ('a'-'z', '0'-'9') and hyphen ('-') characters. The first
#' last characters cannot be hyphens.
#' @param token_auth_enabled If `TRUE`, enable token-based authentication for the web service.
#' If enabled, users can access the web service by fetching an access token using their Azure
#' Active Directory credentials. Defaults to `FALSE`. Both `token_auth_enabled` and
#' `auth_enabled` cannot be set to `TRUE`.
#' @return The `AksServiceDeploymentConfiguration` object.
#' @export
#' @details
#' \subsection{AKS compute target}{
#' When deploying to AKS, you deploy to an AKS cluster that is connected to your workspace.
#' There are two ways to connect an AKS cluster to your workspace:
#' * Create the AKS cluster using Azure ML (see `create_aks_compute()`).
#' * Attach an existing AKS cluster to your workspace (see `attach_aks_compute()`).
#'
#' Pass the `AksCompute` object to the `deployment_target` parameter of `deploy_model()`.
#' }
#' \subsection{Token-based authentication}{
#' We strongly recommend that you create your Azure ML workspace in the same region as your
#' AKS cluster. To authenticate with a token, the web service will make a call to the region
#' in which your workspace is created. If your workspace's region is unavailable, then you will
#' not be able to fetch a token for your web service, even if your cluster is in a different region
#' than your workspace. This effectively results in token-based auth being unavailable until your
#' workspace's region is available again. In addition, the greater the distance between your
#' cluster's region and your workspace's region, the longer it will take to fetch a token.
#' }
#' @examples
#' \dontrun{
#' deployment_config <- aks_webservice_deployment_config(cpu_cores = 1, memory_gb = 1)
#' }
#' @seealso
#' `deploy_model()`
#' @md
aks_webservice_deployment_config <- function(
autoscale_enabled = NULL,
autoscale_min_replicas = NULL,
autoscale_max_replicas = NULL,
autoscale_refresh_seconds = NULL,
autoscale_target_utilization = NULL,
auth_enabled = NULL,
cpu_cores = NULL,
memory_gb = NULL,
enable_app_insights = NULL,
scoring_timeout_ms = NULL,
replica_max_concurrent_requests = NULL,
max_request_wait_time = NULL,
num_replicas = NULL,
primary_key = NULL,
secondary_key = NULL,
tags = NULL,
properties = NULL,
description = NULL,
gpu_cores = NULL,
period_seconds = NULL,
initial_delay_seconds = NULL,
timeout_seconds = NULL,
success_threshold = NULL,
failure_threshold = NULL,
namespace = NULL,
token_auth_enabled = NULL) {
config <- azureml$core$webservice$AksWebservice$deploy_configuration(
autoscale_enabled = autoscale_enabled,
autoscale_min_replicas = autoscale_min_replicas,
autoscale_max_replicas = autoscale_max_replicas,
autoscale_refresh_seconds = autoscale_refresh_seconds,
autoscale_target_utilization = autoscale_target_utilization,
auth_enabled = auth_enabled,
cpu_cores = cpu_cores,
memory_gb = memory_gb,
enable_app_insights = enable_app_insights,
scoring_timeout_ms = scoring_timeout_ms,
replica_max_concurrent_requests = replica_max_concurrent_requests,
max_request_wait_time = max_request_wait_time,
num_replicas = num_replicas,
primary_key = primary_key,
secondary_key = secondary_key,
tags = tags,
properties = properties,
description = description,
gpu_cores = gpu_cores,
period_seconds = period_seconds,
initial_delay_seconds = initial_delay_seconds,
timeout_seconds = timeout_seconds,
success_threshold = success_threshold,
failure_threshold = failure_threshold,
namespace = namespace,
token_auth_enabled = token_auth_enabled)
invisible(config)
}
#' Update a deployed AKS web service
#' @description
#' Update an AKS web service with the provided properties. You can update the
#' web service to use a new model, a new entry script, or new dependencies
#' that can be specified in an inference configuration.
#'
#' Values left as `NULL` will remain unchanged in the web service.
#' @param webservice The `AksWebservice` object.
#' @param autoscale_enabled If `TRUE` enable autoscaling for the web service.
#' @param autoscale_min_replicas An int of the minimum number of containers
#' to use when autoscaling the web service.
#' @param autoscale_max_replicas An int of the maximum number of containers
#' to use when autoscaling the web service.
#' @param autoscale_refresh_seconds An int of how often in seconds the autoscaler
#' should attempt to scale the web service.
#' @param autoscale_target_utilization An int of the target utilization
#' (in percent out of 100) the autoscaler should attempt to maintain for the
#' web service.
#' @param auth_enabled If `TRUE` enable key-based authentication for the
#' web service. Defaults to `TRUE`.
#' @param cpu_cores The number of cpu cores to allocate for
#' the web service. Can be a decimal. Defaults to `0.1`.
#' @param memory_gb The amount of memory (in GB) to allocate for
#' the web service. Can be a decimal. Defaults to `0.5`.
#' @param enable_app_insights If `TRUE` enable AppInsights for the web service.
#' Defaults to `FALSE`.
#' @param scoring_timeout_ms An int of the timeout (in milliseconds) to enforce for
#' scoring calls to the web service.
#' @param replica_max_concurrent_requests An int of the number of maximum concurrent
#' requests per node to allow for the web service.
#' @param max_request_wait_time An int of the maximum amount of time a request
#' will stay in the queue (in milliseconds) before returning a 503 error.
#' @param num_replicas An int of the number of containers to allocate for the
#' web service. If this parameter is not set then the autoscaler is enabled by
#' default.
#' @param tags A named list of key-value tags for the web service,
#' e.g. `list("key" = "value")`. Will replace existing tags.
#' @param properties A named list of key-value properties to add for the web
#' service, e.g. `list("key" = "value")`.
#' @param description A string of the description to give the web service.
#' @param models A list of `Model` objects to package into the updated service.
#' @param inference_config An `InferenceConfig` object.
#' @param gpu_cores An int of the number of gpu cores to allocate for the
#' web service.
#' @param period_seconds An int of how often in seconds to perform the
#' liveness probe. Minimum value is `1`.
#' @param initial_delay_seconds An int of the number of seconds after
#' the container has started before liveness probes are initiated.
#' @param timeout_seconds An int of the number of seconds after which the
#' liveness probe times out. Minimum value is `1`.
#' @param success_threshold An int of the minimum consecutive successes
#' for the liveness probe to be considered successful after having failed.
#' Minimum value is `1`.
#' @param failure_threshold An int of the number of times Kubernetes will try
#' the liveness probe when a Pod starts and the probe fails, before giving up.
#' Minimum value is `1`.
#' @param namespace A string of the Kubernetes namespace in which to deploy the
#' web service: up to 63 lowercase alphanumeric ('a'-'z', '0'-'9') and
#' hyphen ('-') characters. The first last characters cannot be hyphens.
#' @param token_auth_enabled If `TRUE`, enable token-based authentication for
#' the web service. If enabled, users can access the web service by fetching
#' an access token using their Azure Active Directory credentials.
#' Both `token_auth_enabled` and `auth_enabled` cannot be set to `TRUE`.
#' @return None
#' @export
#' @md
update_aks_webservice <- function(webservice, autoscale_enabled = NULL,
autoscale_min_replicas = NULL,
autoscale_max_replicas = NULL,
autoscale_refresh_seconds = NULL,
autoscale_target_utilization = NULL,
auth_enabled = NULL,
cpu_cores = NULL,
memory_gb = NULL,
enable_app_insights = NULL,
scoring_timeout_ms = NULL,
replica_max_concurrent_requests = NULL,
max_request_wait_time = NULL,
num_replicas = NULL,
tags = NULL,
properties = NULL,
description = NULL,
models = NULL,
inference_config = NULL,
gpu_cores = NULL,
period_seconds = NULL,
initial_delay_seconds = NULL,
timeout_seconds = NULL,
success_threshold = NULL,
failure_threshold = NULL,
namespace = NULL,
token_auth_enabled = NULL) {
webservice$update(autoscale_enabled = autoscale_enabled,
autoscale_min_replicas = autoscale_min_replicas,
autoscale_max_replicas = autoscale_max_replicas,
autoscale_refresh_seconds = autoscale_refresh_seconds,
autoscale_target_utilization = autoscale_target_utilization,
auth_enabled = auth_enabled,
cpu_cores = cpu_cores,
memory_gb = memory_gb,
enable_app_insights = enable_app_insights,
scoring_timeout_ms = scoring_timeout_ms,
replica_max_concurrent_requests = replica_max_concurrent_requests,
max_request_wait_time = max_request_wait_time,
num_replicas = num_replicas,
tags = tags,
properties = properties,
description = description,
models = models,
inference_config = inference_config,
gpu_cores = gpu_cores,
period_seconds = period_seconds,
initial_delay_seconds = initial_delay_seconds,
timeout_seconds = timeout_seconds,
success_threshold = success_threshold,
failure_threshold = failure_threshold,
namespace = namespace,
token_auth_enabled = token_auth_enabled)
invisible(NULL)
}
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.