Use this package to manage Azure Resources from within an R Session. This is not a full SDK just a collection of functions that should prove useful for a Data Scientist who needs to access and manage Azure Resources.
Install the development version of the package directly from GitHub with:
# Install devtools if(!require("devtools")) install.packages("devtools") devtools::install_github("Microsoft/AzureSMR") library(AzureSMR)
AzureSMR provides an interface to manage resources on Microsoft Azure. The main functions address the following Azure Services:
For a detailed list of AzureSM functions and their syntax please refer to the Help pages.
To get started, please refer to the Authorisation tutorial. https://github.com/Microsoft/AzureSMR/blob/master/vignettes/Authentication.Rmd
The AzureAPIs require lots of parameters to be managed. Rather than supplying all the paramters for each function call AzureSMR implements an AzureContext Variable which caches the last time a paramters is used so that it doesnt need to be repeatedly supplied.
To create an AzureContext object and attempt to authenticate against the Azure service, use:
sc <- createAzureContext(tenantID = "{TID}", clientID = "{CID}", authKey= "{KEY}") sc
To get an authorisation token use azureAuthenticate()
. Note this token will time our after a period and therefore you need to run it again occasionally. TIP: Use AzureAuthenticate before a long running task.
The azureListSubscriptions()
function lists all the available subscriptions. If you only have one it sets the default Subscription in the azureActiveContext
to that subscription ID.
azureListSubscriptions(sc)
# list resource groups azureListRG(sc) # list all resources azureListAllResources(sc) azureListAllResources(sc, location = "northeurope") azureListAllResources(sc, type = "Microsoft.Sql/servers", location = "northeurope") azureListAllResources(sc, resourceGroup = "Analytics") azureCreateResourceGroup(sc, resourceGroup = "testme", location = "northeurope") azureDeleteResourceGroup(sc, resourceGroup = "testme") azureListRG(sc)$name
Use these functions to list, start and stop Virtual Machines (see templates for Creation).
To Create VMs please refer to Resource Templates below.
azureListVM(sc, resourceGroup = "AWHDIRG") ## Name Location Type OS State Admin ## 1 DSVM1 northeurope Microsoft.Compute/virtualMachines Linux Succeeded alanwe azureStartVM(sc, vmName = "DSVM1") azureStopVM(sc, vmName = "DSVM1")
In order to access Storage Blobs you need to have a key. Use azureSAGetKey()
to get a Key or alternatively supply your own key. When you provide your own key you no longer need to use azureAuthenticate()
since the API uses a diferent authentication approach.
sKey <- azureSAGetKey(sc, resourceGroup = "Analytics", storageAccount = "analyticsfiles")
To list containers in a storage account use azureListContainers()
azureListContainers(sc, storageAccount = "analyticsfiles", containers = "Test")
To list blobs in a container use azureListStorageBlobs()
azureListStorageBlobs(sc, storageAccount = "analyticsfiles", container = "test")
To Write a Blobs use azurePutBlob()
azurePutBlob(sc, StorageAccount = "analyticsfiles", container = "test", contents = "Hello World", blob = "HELLO")
To read a blob in a container use azureGetBlob()
azureGetBlob(sc, storageAccount = "analyticsfiles", container = "test", blob="HELLO", type="text")
You can use AzureSMR
to manage Azure HDInsight clusters. To create clusters use Resource Templates (See below).
Also see functions for submitting Hive and Spark jobs.
Use azureListHDI()
to list available Clusters.
azureListHDI(sc) azureListHDI(sc, resourceGroup ="Analytics")
Use azureResizeHDI()
to resize a cluster
azureResizeHDI(sc, resourceGroup = "Analytics", clusterName = "{HDIClusterName}", Role="workernode",Size=2) ## AzureResizeHDI: Request Submitted: 2016-06-23 18:50:57 ## Resizing(R), Succeeded(S) ## RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR ## RRRRRRRRRRRRRRRRRRS ## Finished Resizing Sucessfully: 2016-06-23 19:04:43 ## Finished: 2016-06-23 19:04:43 ## ## Information ## " headnode ( 2 * Standard_D3_v2 ) workernode ( 5 * Standard_D3_v2 ) zookeepernode ( 3 * Medium ) edgenode0 ( 1 * Standard_D4_v2 )"
The easiest way to create resources on Azure is to use Azure Templates. To create Azure Resources such as HDInsight clusters there can a large quantity of parameters. Resource templates can be built be creating a resource in the Azure Portal and then going into Settings > Automation scripts. Example templates can be found azToken this URL https://github.com/Azure/AzureStack-QuickStart-Templates.
To create a resource using a template in AzureSM use AzureDeployTemplate. The Template and Paramters must be available in a public URL (Azure Blob). It may be worth getting the Azure Administrator to build a working template.
azureDeployTemplate(sc, resourceGroup = "Analytics", deplName = "Deploy1", templateURL = "{TEMPLATEURL}", paramURL = "{PARAMURL}") ## AzureDeployTemplate: Request Submitted: 2016-06-23 18:50:57 ## Resizing(R), Succeeded(S) ## RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR ## RRRRRRRRRRRRRRRRRRS ## Finished Deployed Sucessfully: 2016-06-23 19:04:43 ## Finished: 2016-06-23 19:04:43
ADMIN TIP: If a deployment fails. Go to the Azure Portal and look azToken Actvity logs and look for failed deployments which should explain why the deployment failed.
These functions facilitate the use of hive jobs on a HDInsight Cluster
azureHiveStatus(sc, clusterName = "{hdicluster}", hdiAdmin = "admin", hdiPassword = "********") azureHiveSQL(sc, CMD = "select * from airports", Path = "wasb://{container}@{hdicluster}.blob.core.windows.net/") stdout <- azureGetBlob(sc, Container = "test", Blob = "stdout") read.delim(text=stdout, header=TRUE, fill=TRUE)
AzureSMR
provides some functions that allow HDInsight Spark Sessions and jobs to be managed within an R Session
To Create a new Spark Session (Via Livy) use azureSparkNewSession()
azureSparkNewSession(sc, clusterName = "{hdicluster}", hdiAdmin = "admin", hdiPassword = "********", kind = "pyspark")
To view the status of sessions use AzureSparkListSessions
azureSparkListSessions(sc, clusterName = "{hdicluster}")
To send a command to the Spark Session use azureSparkCMD()
. In this case it submits a Python routine
# SAMPLE PYSPARK SCRIPT TO CALCULATE PI pythonCmd <- ' from pyspark import SparkContext from operator import add import sys from random import random partitions = 1 n = 20000000 * partitions def f(_): x = random() * 2 - 1 y = random() * 2 - 1 return 1 if x ** 2 + y ** 2 < 1 else 0 count = sc.parallelize(range(1, n + 1), partitions).map(f).reduce(add) Pi = (4.0 * count / n) print("Pi is roughly %f" % Pi)' azureSparkCMD(sc, cmd = pythonCmd, sessionID = "5") ## [1] "Pi is roughly 3.140285"
Check Session variables are retained
azureSparkCMD(sc, clusterName = "{hdicluster}", cmd = "print Pi", sessionID="5") #[1] "3.1422"
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.