Use Case

In this tutorial a Ubuntu DSVM is deployed whilst sample code to deploy a Windows Data Science Virtual Machine (DSVM) is provided. The virtual machine is created within its own resource group so that all created resources (the VM, networking, disk, etc) can be deleted easily. Code is also included, but not run, to then delete the resource group if the resource group was created within this vignette. Once deleted consumption (cost) will cease.

This script is best run interactively to review its operation and to ensure that the interaction with Azure completes.

An R script that can be generated from this vignette and can be run as a standalone script to setup a new resource group and single Ubuntu DSVM.

Preparation

We assume the user already has an Azure subscription and has obtained their credentials as explained in the Introduction vignette. We ensure a resource group exists and within that resource group deploy the Linux DSVM. A secure shell (ssh) public key matching the current user's private key is used to access the server in this script although a username and password is also an option.

Setup

To get started we need to load our Azure credentials as well as the user's ssh public key. Public keys on Linux are typically created on the users desktop/laptop machine and will be found within ~/.ssh/id_rsa.pub. It will be convenient to create a credentials file to contain this information. The contents of the credentials file will be something like the foloowing and we assume the user creates such a file in the current working directory, naming the file _credentials.R. Replace with the user's username.

# Credentials come from app creation in Active Directory within Azure.
#
# See the following for details of app creation.
#
# https://github.com/Azure/AzureDSVM/blob/master/vignettes/00Introduction.Rmd

TID <- "72f9....db47"          # Tenant ID
CID <- "9c52....074a"          # Client ID
KEY <- "9Efb....4nwV....ASa8=" # User key

PUBKEY   <- readLines("~/.ssh/id_rsa.pub") # For Linux DSVM
PASSWORD <- "Public%4aR3@kn"               # For Windows DSVM

Notice we include a password (a fake password in this case) for account creation on a Windows DSVM.

We can simply source the credentials file in R.

# Load the required subscription resources: TID, CID, and KEY.
# Also includes the ssh PUBKEY for the user.

USER <- Sys.info()[['user']]

source(paste0(USER, "_credentials.R"))

If the required pacakges are not yet installed the following will do so. You may need to install them into your own local library rather than the system library if you are not a system user.

# Install the packages if required.

devtools::install_github("Microsoft/AzureSMR")
devtools::install_github("Azure/AzureDSVM")

We can then load the required pacakges from the libraries.

# Load the required packages.

library(AzureSMR)    # Support for managing Azure resources.
library(AzureDSVM)   # Further support for the Data Scientist.
library(magrittr)    
library(dplyr)
# Parameters for this script: the name for the new resource group and
# its location across the Azure cloud. The resource name is used to
# name the resource group that we will create transiently for the
# purposes of this script.

# Create a random name which will be used for the hostname and
# resource group to reduce likelihood of conflict with other users.

runif(4, 1, 26) %>%
  round() %>%
  letters[.] %>%
  paste(collapse="") %T>%
  {sprintf("Base name:\t\t%s", .) %>% cat("\n")} ->
BASE

# Choose a data centre location. The abbreviation is used for the
# resource group name.

"southeastasia"  %T>%
  {sprintf("Data centre location:\t%s", .) %>% cat("\n")} ->
LOC

ABR <- "sea"

# Create a random resource group to reduce likelihood of conflict with
# other users.

BASE %>%
  paste0("my_dsvm_", .,"_rg_", ABR) %T>%
  {sprintf("Resource group:\t\t%s", .) %>% cat("\n")} ->
RG

# Include the random BASE in the hostname to reducely likelihood of
# conflict.

BASE %>%
  paste0("my", .) %T>%
  {sprintf("Hostname:\t\t%s", .) %>% cat("\n")} ->
HOST

cat("\n")
# Connect to the Azure subscription and use this as the context for
# our activities.

context <- createAzureContext(tenantID=TID, clientID=CID, authKey=KEY)

# Check if the resource group already exists. Take note this script
# will not remove the resource group if it pre-existed.

rg_pre_exists <- existsRG(context, RG, LOC)

# Check that it now exists.

cat("Resource group", RG, "at", LOC,
    ifelse(!existsRG(context, RG, LOC), "does not exist.\n", "exists.\n"), "\n")

Create a Resource Group

Create the resource group within which all resources we create will be grouped.

# Create a new resource group into which we create the VMs and related
# resources. Resource group name is RG.  Note that to create a new
# resource group one needs to add access control of Active Directory
# application at subscription level.

if (! rg_pre_exists)
{
  azureCreateResourceGroup(context, RG, LOC) %>% cat("\n\n")
}

# Check that it now exists.

cat("Resource group", RG, "at", LOC,
    ifelse(!existsRG(context, RG, LOC), "does not exist.\n", "exists.\n"), "\n")

Deploy a Linux Data Science Virtual Machine

DSVM deployment

Create the actual Linux DSVM with public-key based authentication method. Name, username, and size can also be configured.

We can check the available VM sizes within the region by using getVMSizes(). Different sizes will cost differently, and the detailed information can be checked on Azure website. The default VM size for deployment is chosen for by enhanced computation performance. See the documentation for deployDSVM() for the actual default.

# List the available VM sizes. May differ with location of the data centre.

getVMSizes(context, LOC) %>%
  set_names(c("Size", "Cores", "DiskGB", "RAM GB", "Disks"))

# The default size.

formals(deployDSVM)$size

# Choose a size to suit

SIZE <- "Standard_D1_v2" # 1 Core, 3.5 GB RAM,  50 GB SSD,  $80
SIZE <- "Standard_D3_v2" # 4 Cores, 14 GB RAM, 200 GB SSD, $318

# The default operating system.

formals(deployDSVM)$os

The following code deploys a Linux DSVM which will take a few minutes.

# Create the required Linux DSVM - generally 4 minutes.

ldsvm <- deployDSVM(context, 
                    resource.group = RG,
                    location       = LOC,
                    hostname       = HOST,
                    username       = USER,
                    size           = SIZE,
                    pubkey         = PUBKEY)
ldsvm

operateDSVM(context, RG, HOST, operation="Check")

azureListVM(context, RG)

Prove that the deployed DSVM exists.

# Send a simple system() command across to the new server to test its
# existence. Expect a single line with an indication of how long the
# server has been up and running.

# NOTE this must be done after a while since even though deployment is
# reported there is a small delay before actually available.

Sys.sleep(20)

ssh <- paste("ssh -q",
             "-o StrictHostKeyChecking=no",
             "-o UserKnownHostsFile=/dev/null",
             ldsvm)

cmd <- paste(ssh, "uptime")
cmd

system(cmd, intern=TRUE)

Some Standard Setup --- Optional

We can install some useful tools on a fesh server. Note that the Ubuntu server will still be running some background scripts as part of its own setup so if there are lock error messages (could not get lock) from the following commands then simply try again in a short while. We also update the operating system here though because of a bad console interaction from the msodbcsql package asking about licensing we have to do the distupgrade through a terminal so we need to log on to the server through the secure shell and manually run that command. We then reboot the server so that, for example, kernel updates, take effect.

system(paste(ssh, "sudo locale-gen 'en_AU.UTF-8'"))
system(paste(ssh, "sudo apt-get -y install wajig"))
system(paste(ssh, "wajig install -y lsb htop"))
system(paste(ssh, "lsb_release -idrc"))
system(paste(ssh, "wajig update"))
system(paste(ssh, "wajig distupgrade -y"))
system(paste(ssh, "sudo reboot"))
Sys.sleep(20)
system(paste(ssh, "uptime"))

An alternative for this post-deployment system configuration is addExtensionDSVM function, which is detailed in vignette 11Exend.md.

Configuration for Microsoft R Server.

Since version 9, Microsoft R Server offers methods in the package of mrsdeploy for convenient interaction with R session on a remote instance where MRS is installed and properly configured.

To enable such interaction, a one-box configuration is needed. One-box configuration on a Linux DSVM with key-based authentication methdod can be achieved via mrsOneBoxConfiguration function.

mrsOneBoxConfiguration(context,
                       resource.group=RG,
                       location=LOC,
                       hostname=HOST, 
                       username=USER, 
                       password=PASSWORD)

NOTE the passowrd here refers to password used for creating remote session with mrsdeploy. Default user name for mrsdeploy is "admin". More details about how to use mrsdeploy for remote interaction can be found here.

Deploy a Windows Data Science Virtual Machine - Optional

deployDSVM() also supports deployment of Windows DSVM, which can be achieved by setting the argument of os to "Windows". The deployment will take approximately 10 minutes. One can use Remote Desktop to verify the success of deployment and use the virtual machine in a remote desktop environment.

wdsvm <- deployDSVM(context,
                    resource.group=RG,
                    location=LOC,
                    hostname="xxxx",
                    username=USER,
                    os="Windows",
                    authen="Password",
                    password=PASSWORD)

wdsvm

Optional Stop

It is always a good practice to stop DSVMs after using them to avoid any unnecessary cost.

operateDSVM(context, RG, HOST, operation="Stop")

Optional Cleanup

Once we have finished with the server we can delete it and all of its related resources.

# Delete the resource group now that we have proved existence. There
# is probably no need to wait. Only delete if it did not pre-exist
# this script. Deletion takes 10 minutes or more.

if (! rg_pre_exists)
  azureDeleteResourceGroup(context, RG)

Once deleted we are consuming no more.



Azure/AzureDSVM documentation built on May 20, 2019, 2:03 p.m.