Use this package to manage Azure Resources from within an R Session. This package does not expose the complete Azure API, but is meant as a collection of functions that a typical data scientist may use to access and manage Azure Resources.

Installation instructions

Install the development version of the package directly from GitHub with:

# Install devtools
if(!require("devtools")) install.packages("devtools")
devtools::install_github("Microsoft/AzureSMR")
library(AzureSMR)

Overview

AzureSMR provides an interface to manage resources on Microsoft Azure. The main functions address the following Azure Services:

For a detailed list of AzureSMR functions and their syntax please refer to the Help pages.

Configuring authorisation in Azure Active Directory

To get started, please refer to the authorisation tutorial

Load the package

library(AzureSMR)

Authenticating against the Azure service

The Azure APIs require many parameters to be managed. Rather than supplying all the arguments to every function call, AzureSMR uses an azureActiveContext object that caches arguments so you don't have to supply .

To create an azureActiveContext object and attempt to authenticate against the Azure service, use:

sc <- createAzureContext(tenantID = "{TID}", clientID = "{CID}", authKey= "{KEY}")
sc

or using the "DeviceCode" flow if supported by that resource by trying:

sc <- createAzureContext(tenantID = "{TID}", clientID = "{CID}", authType= "DeviceCode")
# Manually authenticate using DeviceCode flow
rgs <- azureListRG(sc)
rgs

If you provide autentication paramters to createAzureContext() the function will automatically authenticate. To manually get an authorisation token use azureAuthenticate(). Note this token will time out after a period and therefore you need to run it again occasionally. TIP: Use azureAuthenticate() before a long running task.

Subscriptions

The azureListSubscriptions() function lists all the available subscriptions. If you only have one it sets the default Subscription in the azureActiveContext to that subscription ID.

azureListSubscriptions(sc)

Manage resource Groups

# list resource groups
azureListRG(sc)

# list all resources
azureListAllResources(sc)

azureListAllResources(sc, location = "northeurope")

azureListAllResources(sc, type = "Microsoft.Sql/servers", location = "northeurope")

azureCreateResourceGroup(sc, resourceGroup = "testme", location = "northeurope")

azureCreateStorageAccount(sc,storageAccount="testmystorage1",resourceGroup = "testme")

azureListAllResources(sc, resourceGroup = "testme")

# When finished, to delete a Resource Group use azureDeleteResourceGroup()
azureDeleteResourceGroup(sc, resourceGroup = "testme")

Manage Virtual Machines

Use these functions to list, start and stop existing Virtual Machines (see templates for creation).

To create VMs please refer to Resource Templates below.

## List VMs in a ResourceGroup
azureListVM(sc, resourceGroup = "testme")

##            Name    Location                             Type    OS     State  Admin
## 1         DSVM1 northeurope Microsoft.Compute/virtualMachines Linux Succeeded

azureStartVM(sc, vmName = "DSVM1")
azureStopVM(sc, vmName = "DSVM1")

Accessing storage blobs using the azureActiveContext

To access storage blobs you need to have a key. You can use azureSAGetKey() to automatically retrieve your key.

azureSAGetKey(sc, resourceGroup = "testme", storageAccount = "testmystorage1")

To create containers in a storage account useazureCreateStorageContainer()

azureCreateStorageContainer(sc, "opendata", storageAccount = "testmystorage1", resourceGroup = "testme")

To list containers in a storage account use azureListContainers()

azureListStorageContainers(sc, storageAccount = "testmystorage1", resourceGroup = "testme")

To write a blob use azurePutBlob()

azurePutBlob(sc, storageAccount = "testmystorage1", container = "opendata", 
             contents = "Hello World",
             blob = "HELLO") 

To list blobs in a container use azureListStorageBlobs()

azureListStorageBlobs(sc, storageAccount = "testmystorage1", container = "opendata")

To read a blob in a container use azureGetBlob()

azureGetBlob(sc, storageAccount = "testmystorage1", container = "opendata",
             blob="HELLO",
             type="text") 

Accessing storage blobs without an azureActiveContext

It is also possible to access the blob functionswithout having an Azure Active Directory application.

In this case, you should use the argument azureActiveContect = NULL to the storage functions.

For example:

azureListStorageBlobs(NULL, storageAccount = "testmystorage1", container = "opendata")

Manage HDInsight clusters

You can use AzureSMR to manage HDInsight clusters. To create a cluster use azureCreateHDI().

For advanced configurations use Resource Templates (See below).

azureCreateHDI(sc,
                 resourceGroup = "testme",
                 clustername = "smrhdi", # only low case letters, digit, and dash.
                 storageAccount = "testmystorage1",
                 adminUser = "hdiadmin",
                 adminPassword = "AzureSMR_password123",
                 sshUser = "hdisshuser",
                 sshPassword = "AzureSMR_password123", 
                 kind = "rserver")

Use azureListHDI() to list available clusters.

azureListHDI(sc, resourceGroup ="testme")

Use azureResizeHDI() to resize a cluster

azureResizeHDI(sc, resourceGroup = "testme", clustername = "smrhdi", role="workernode",size=3)

## azureResizeHDI: Request Submitted:  2016-06-23 18:50:57
## Resizing(R), Succeeded(S)
## RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR
## RRRRRRRRRRRRRRRRRRS
## Finished Resizing Sucessfully:  2016-06-23 19:04:43
## Finished:  2016-06-23 19:04:43
##                                                                                                                        ## Information 
## " headnode ( 2 * Standard_D3_v2 ) workernode ( 5 * Standard_D3_v2 ) zookeepernode ( 3 * Medium ) edgenode0 ( 1 * Standard_D4_v2 )" 

Resource templates - create Azure resources

The easiest way to create resources on Azure is to use Azure Resource Manager (ARM) templates. To create Azure resources such as HDInsight clusters there can a large quantity of parameters. Resource templates can be built be creating a resource in the Azure Portal and then going into Settings > Automation scripts. You can find many example templates at https://github.com/Azure/AzureStack-QuickStart-Templates.

To create a resource using a template in AzureSMR use azureDeployTemplate(). The template and paramaters must be available in a public URL (for example in Azure blob store), or you can supply these as JSON strings.

azureDeployTemplate(sc, resourceGroup = "Analytics", deplName = "Deploy1", 
                    templateURL = "{TEMPLATEURL}", paramURL = "{PARAMURL}")

## azureDeployTemplate: Request Submitted:  2016-06-23 18:50:57
## Resizing(R), Succeeded(S)
## RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR
## RRRRRRRRRRRRRRRRRRS
## Finished Deployed Sucessfully:  2016-06-23 19:04:43
## Finished:  2016-06-23 19:04:43

ADMIN TIP: If a deployment fails, go to the Azure Portal and look at Activity logs and look for failed deployments - this should explain why the deployment failed.

Hive Functions

You can use these functions to run and manage hive jobs on an HDInsight Cluster.

azureHiveStatus(sc, clusterName = "smrhdi", 
                hdiAdmin = "hdiadmin", 
                hdiPassword = "AzureSMR_password123")

azureHiveSQL(sc, 
             CMD = "select * from hivesampletable", 
             path = "wasb://opendata@testmystorage1.blob.core.windows.net/")

Spark functions (experimental)

AzureSMR provides some functions that allow HDInsight Spark aessions and jobs to be managed within an R Session.

To create a new Spark session (via Livy) use azureSparkNewSession()

azureSparkNewSession(sc, clustername = "smrhdi", 
                     hdiAdmin = "hdiadmin", 
                     hdiPassword = "AzureSMR_password123",
                     kind = "pyspark")

To view the status of sessions use azureSparkListSessions(). Wait for status to be idle.

azureSparkListSessions(sc, clustername = "smrhdi")

To send a command to the Spark Session use azureSparkCMD(). In this case it submits a Python routine. Ensure you preserve indents for Python.

# SAMPLE PYSPARK SCRIPT TO CALCULATE PI
pythonCmd <- '
from pyspark import SparkContext
from operator import add
import sys
from random import random
partitions = 1
n = 20000000 * partitions
def f(_):
  x = random() * 2 - 1
  y = random() * 2 - 1
  return 1 if x ** 2 + y ** 2 < 1 else 0

count = sc.parallelize(range(1, n + 1), partitions).map(f).reduce(add)
Pi = (4.0 * count / n)
print("Pi is roughly %f" % Pi)'                   

azureSparkCMD(sc, CMD = pythonCmd, sessionID = "0")

## [1] "Pi is roughly 3.140285"

Check Session variables are retained

azureSparkCMD(sc, clustername = "smrhdi", CMD = "print Pi", sessionID = "0")

#[1] "3.1422"

You can also run SparkR sessions

azureSparkNewSession(sc, clustername = "smrhdi", 
                     hdiAdmin = "hdiadmin", 
                     hdiPassword = "AzureSMR_password123",
                     kind = "sparkr")
azureSparkCMD(sc, clustername = "smrhdi", CMD = "HW<-'hello R'", sessionID = "2")
azureSparkCMD(sc, clustername = "smrhdi", CMD = "cat(HW)", sessionID = "2")

Accessing Azure Data Lake Store using the azureActiveContext

To access Azure Data Lake Store you need to generate an access token using either "ClientCredential" (default) or "DeviceCode" AuthType use createAzureContext().

asc <- createAzureContext(tenantID = "{TID}", clientID = "{CID}", authKey= "{KEY}")

To create directories in Azure Data Lake Store use azureDataLakeMkdirs()

azureDataLakeMkdirs(asc, azureDataLakeAccount, "tempfolder")

To list items in Azure Data Lake Store use azureDataLakeListStatus()

azureDataLakeListStatus(asc, azureDataLakeAccount, "")
azureDataLakeListStatus(asc, azureDataLakeAccount, "tempfolder")

To create a file and optionally write data to the new file in Azure Data Lake Store use azureDataLakeCreate()

azureDataLakeCreate(asc, azureDataLakeAccount, "tempfolder/tempfile00.txt", 
                    "755", FALSE, 
                    4194304L, 3L, 268435456L, 
                    charToRaw("abcd"))

To append to a file in Azure Data Lake Store use azureDataLakeAppend()

azureDataLakeAppend(asc, azureDataLakeAccount, "tempfolder/tempfile00.txt", 4194304L, charToRaw("stuv"))

To read a file in Azure Data Lake Store use azureDataLakeRead()

azureDataLakeRead(asc, azureDataLakeAccount, "tempfolder/tempfile00.txt", 
                  length = 2L, bufferSize = 4194304L)

To delete item(s) in Azure Data Lake Store use azureDataLakeDelete()

azureDataLakeDelete(asc, azureDataLakeAccount, "tempfolder", TRUE)


Microsoft/AzureSMR documentation built on July 7, 2019, 11:25 p.m.