Use this package to manage Azure Resources from within an R Session. This is not a full SDK just a collection of functions that should prove useful for a Data Scientist who needs to access and manage Azure Resources.

Installation instructions

Install the development version of the package directly from GitHub with:

# Install devtools
if(!require("devtools")) install.packages("devtools")
devtools::install_github("Microsoft/AzureSMR")
library(AzureSMR)

Overview

AzureSMR provides an interface to manage resources on Microsoft Azure. The main functions address the following Azure Services:

For a detailed list of AzureSM functions and their syntax please refer to the Help pages.

Getting Authorisation configured

To get started, please refer to the Authorisation tutorial. https://github.com/Microsoft/AzureSMR/blob/master/vignettes/Authentication.Rmd

Authenticating against the service

The AzureAPIs require lots of parameters to be managed. Rather than supplying all the paramters for each function call AzureSMR implements an AzureContext Variable which caches the last time a paramters is used so that it doesnt need to be repeatedly supplied.

To create an AzureContext object and attempt to authenticate against the Azure service, use:

sc <- createAzureContext(tenantID = "{TID}", clientID = "{CID}", authKey= "{KEY}")
sc

To get an authorisation token use azureAuthenticate(). Note this token will time our after a period and therefore you need to run it again occasionally. TIP: Use AzureAuthenticate before a long running task.

The azureListSubscriptions() function lists all the available subscriptions. If you only have one it sets the default Subscription in the azureActiveContext to that subscription ID.

azureListSubscriptions(sc)

Manage resource Groups

# list resource groups
azureListRG(sc)

# list all resources
azureListAllResources(sc)

azureListAllResources(sc, location = "northeurope")

azureListAllResources(sc, type = "Microsoft.Sql/servers", location = "northeurope")

azureListAllResources(sc, resourceGroup = "Analytics")

azureCreateResourceGroup(sc, resourceGroup = "testme", location = "northeurope")

azureDeleteResourceGroup(sc, resourceGroup = "testme")

azureListRG(sc)$name

Manage Virtual Machines

Use these functions to list, start and stop Virtual Machines (see templates for Creation).

To Create VMs please refer to Resource Templates below.

azureListVM(sc, resourceGroup = "AWHDIRG")

##            Name    Location                             Type    OS     State  Admin
## 1         DSVM1 northeurope Microsoft.Compute/virtualMachines Linux Succeeded alanwe

azureStartVM(sc, vmName = "DSVM1")
azureStopVM(sc, vmName = "DSVM1")

Access Storage Blobs

In order to access Storage Blobs you need to have a key. Use azureSAGetKey() to get a Key or alternatively supply your own key. When you provide your own key you no longer need to use azureAuthenticate() since the API uses a diferent authentication approach.

sKey <- azureSAGetKey(sc, resourceGroup = "Analytics", storageAccount = "analyticsfiles")

To list containers in a storage account use azureListContainers()

azureListContainers(sc, storageAccount = "analyticsfiles", containers = "Test")

To list blobs in a container use azureListStorageBlobs()

azureListStorageBlobs(sc, storageAccount = "analyticsfiles", container = "test")

To Write a Blobs use azurePutBlob()

azurePutBlob(sc, StorageAccount = "analyticsfiles", container = "test", 
             contents = "Hello World",
             blob = "HELLO") 

To read a blob in a container use azureGetBlob()

azureGetBlob(sc, storageAccount = "analyticsfiles", container = "test",
             blob="HELLO",
             type="text") 

Manage HDInsight Clusters

You can use AzureSMR to manage Azure HDInsight clusters. To create clusters use Resource Templates (See below).

Also see functions for submitting Hive and Spark jobs.

Use azureListHDI() to list available Clusters.

azureListHDI(sc)
azureListHDI(sc, resourceGroup ="Analytics")

Use azureResizeHDI() to resize a cluster

azureResizeHDI(sc, resourceGroup = "Analytics", clusterName = "{HDIClusterName}", 
               Role="workernode",Size=2)

## AzureResizeHDI: Request Submitted:  2016-06-23 18:50:57
## Resizing(R), Succeeded(S)
## RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR
## RRRRRRRRRRRRRRRRRRS
## Finished Resizing Sucessfully:  2016-06-23 19:04:43
## Finished:  2016-06-23 19:04:43
##                                                                                                                        ## Information 
## " headnode ( 2 * Standard_D3_v2 ) workernode ( 5 * Standard_D3_v2 ) zookeepernode ( 3 * Medium ) edgenode0 ( 1 * Standard_D4_v2 )" 

Resource Templates - Create Azure Resources

The easiest way to create resources on Azure is to use Azure Templates. To create Azure Resources such as HDInsight clusters there can a large quantity of parameters. Resource templates can be built be creating a resource in the Azure Portal and then going into Settings > Automation scripts. Example templates can be found azToken this URL https://github.com/Azure/AzureStack-QuickStart-Templates.

To create a resource using a template in AzureSM use AzureDeployTemplate. The Template and Paramters must be available in a public URL (Azure Blob). It may be worth getting the Azure Administrator to build a working template.

azureDeployTemplate(sc, resourceGroup = "Analytics", deplName = "Deploy1", 
                    templateURL = "{TEMPLATEURL}", paramURL = "{PARAMURL}")

## AzureDeployTemplate: Request Submitted:  2016-06-23 18:50:57
## Resizing(R), Succeeded(S)
## RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR
## RRRRRRRRRRRRRRRRRRS
## Finished Deployed Sucessfully:  2016-06-23 19:04:43
## Finished:  2016-06-23 19:04:43

ADMIN TIP: If a deployment fails. Go to the Azure Portal and look azToken Actvity logs and look for failed deployments which should explain why the deployment failed.

Hive Functions

These functions facilitate the use of hive jobs on a HDInsight Cluster

azureHiveStatus(sc, clusterName = "{hdicluster}", 
                hdiAdmin = "admin", 
                hdiPassword = "********")
azureHiveSQL(sc, 
             CMD = "select * from airports", 
             Path = "wasb://{container}@{hdicluster}.blob.core.windows.net/")

stdout <- azureGetBlob(sc, Container = "test", Blob = "stdout")

read.delim(text=stdout,  header=TRUE, fill=TRUE)

Spark functions (experimental)

AzureSMR provides some functions that allow HDInsight Spark Sessions and jobs to be managed within an R Session

To Create a new Spark Session (Via Livy) use azureSparkNewSession()

azureSparkNewSession(sc, clusterName = "{hdicluster}", 
                     hdiAdmin = "admin", 
                     hdiPassword = "********",
                     kind = "pyspark")

To view the status of sessions use AzureSparkListSessions

azureSparkListSessions(sc, clusterName = "{hdicluster}")

To send a command to the Spark Session use azureSparkCMD(). In this case it submits a Python routine

# SAMPLE PYSPARK SCRIPT TO CALCULATE PI
pythonCmd <- '
from pyspark import SparkContext
from operator import add
import sys
from random import random
partitions = 1
n = 20000000 * partitions
def f(_):
  x = random() * 2 - 1
  y = random() * 2 - 1
  return 1 if x ** 2 + y ** 2 < 1 else 0

count = sc.parallelize(range(1, n + 1), partitions).map(f).reduce(add)
Pi = (4.0 * count / n)
print("Pi is roughly %f" % Pi)'                   

azureSparkCMD(sc, cmd = pythonCmd, sessionID = "5")

## [1] "Pi is roughly 3.140285"

Check Session variables are retained

azureSparkCMD(sc, clusterName = "{hdicluster}", cmd = "print Pi", sessionID="5")

#[1] "3.1422"


ymmah/AzureSMR documentation built on May 6, 2019, 10:53 a.m.