Use this package to manage Azure Resources from within an R Session. This package does not expose the complete Azure API, but is meant as a collection of functions that a typical data scientist may use to access and manage Azure Resources.
Install the development version of the package directly from GitHub with:
# Install devtools if(!require("devtools")) install.packages("devtools") devtools::install_github("Microsoft/AzureSMR") library(AzureSMR)
AzureSMR
provides an interface to manage resources on Microsoft Azure. The main functions address the following Azure Services:
For a detailed list of AzureSMR
functions and their syntax please refer to the Help pages.
To get started, please refer to the authorisation tutorial
library(AzureSMR)
The Azure APIs require many parameters to be managed. Rather than supplying all the arguments to every function call, AzureSMR
uses an azureActiveContext
object that caches arguments so you don't have to supply .
To create an azureActiveContext
object and attempt to authenticate against the Azure service, use:
sc <- createAzureContext(tenantID = "{TID}", clientID = "{CID}", authKey= "{KEY}") sc
or using the "DeviceCode" flow if supported by that resource by trying:
sc <- createAzureContext(tenantID = "{TID}", clientID = "{CID}", authType= "DeviceCode") # Manually authenticate using DeviceCode flow rgs <- azureListRG(sc) rgs
If you provide autentication paramters to createAzureContext()
the function will automatically authenticate. To manually get an authorisation token use azureAuthenticate()
. Note this token will time out after a period and therefore you need to run it again occasionally. TIP: Use azureAuthenticate()
before a long running task.
The azureListSubscriptions()
function lists all the available subscriptions. If you only have one it sets the default Subscription in the azureActiveContext
to that subscription ID.
azureListSubscriptions(sc)
# list resource groups azureListRG(sc) # list all resources azureListAllResources(sc) azureListAllResources(sc, location = "northeurope") azureListAllResources(sc, type = "Microsoft.Sql/servers", location = "northeurope") azureCreateResourceGroup(sc, resourceGroup = "testme", location = "northeurope") azureCreateStorageAccount(sc,storageAccount="testmystorage1",resourceGroup = "testme") azureListAllResources(sc, resourceGroup = "testme") # When finished, to delete a Resource Group use azureDeleteResourceGroup() azureDeleteResourceGroup(sc, resourceGroup = "testme")
Use these functions to list, start and stop existing Virtual Machines (see templates for creation).
To create VMs please refer to Resource Templates below.
## List VMs in a ResourceGroup azureListVM(sc, resourceGroup = "testme") ## Name Location Type OS State Admin ## 1 DSVM1 northeurope Microsoft.Compute/virtualMachines Linux Succeeded azureStartVM(sc, vmName = "DSVM1") azureStopVM(sc, vmName = "DSVM1")
azureActiveContext
To access storage blobs you need to have a key. You can use azureSAGetKey()
to automatically retrieve your key.
azureSAGetKey(sc, resourceGroup = "testme", storageAccount = "testmystorage1")
To create containers in a storage account useazureCreateStorageContainer()
azureCreateStorageContainer(sc, "opendata", storageAccount = "testmystorage1", resourceGroup = "testme")
To list containers in a storage account use azureListContainers()
azureListStorageContainers(sc, storageAccount = "testmystorage1", resourceGroup = "testme")
To write a blob use azurePutBlob()
azurePutBlob(sc, storageAccount = "testmystorage1", container = "opendata", contents = "Hello World", blob = "HELLO")
To list blobs in a container use azureListStorageBlobs()
azureListStorageBlobs(sc, storageAccount = "testmystorage1", container = "opendata")
To read a blob in a container use azureGetBlob()
azureGetBlob(sc, storageAccount = "testmystorage1", container = "opendata", blob="HELLO", type="text")
azureActiveContext
It is also possible to access the blob functionswithout having an Azure Active Directory application.
In this case, you should use the argument azureActiveContect = NULL
to the storage functions.
For example:
azureListStorageBlobs(NULL, storageAccount = "testmystorage1", container = "opendata")
You can use AzureSMR
to manage HDInsight clusters. To create a cluster use azureCreateHDI()
.
For advanced configurations use Resource Templates (See below).
azureCreateHDI(sc, resourceGroup = "testme", clustername = "smrhdi", # only low case letters, digit, and dash. storageAccount = "testmystorage1", adminUser = "hdiadmin", adminPassword = "AzureSMR_password123", sshUser = "hdisshuser", sshPassword = "AzureSMR_password123", kind = "rserver")
Use azureListHDI()
to list available clusters.
azureListHDI(sc, resourceGroup ="testme")
Use azureResizeHDI()
to resize a cluster
azureResizeHDI(sc, resourceGroup = "testme", clustername = "smrhdi", role="workernode",size=3) ## azureResizeHDI: Request Submitted: 2016-06-23 18:50:57 ## Resizing(R), Succeeded(S) ## RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR ## RRRRRRRRRRRRRRRRRRS ## Finished Resizing Sucessfully: 2016-06-23 19:04:43 ## Finished: 2016-06-23 19:04:43 ## ## Information ## " headnode ( 2 * Standard_D3_v2 ) workernode ( 5 * Standard_D3_v2 ) zookeepernode ( 3 * Medium ) edgenode0 ( 1 * Standard_D4_v2 )"
The easiest way to create resources on Azure is to use Azure Resource Manager (ARM) templates. To create Azure resources such as HDInsight clusters there can a large quantity of parameters. Resource templates can be built be creating a resource in the Azure Portal and then going into Settings > Automation scripts
. You can find many example templates at https://github.com/Azure/AzureStack-QuickStart-Templates.
To create a resource using a template in AzureSMR
use azureDeployTemplate()
. The template and paramaters must be available in a public URL (for example in Azure blob store), or you can supply these as JSON strings.
azureDeployTemplate(sc, resourceGroup = "Analytics", deplName = "Deploy1", templateURL = "{TEMPLATEURL}", paramURL = "{PARAMURL}") ## azureDeployTemplate: Request Submitted: 2016-06-23 18:50:57 ## Resizing(R), Succeeded(S) ## RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR ## RRRRRRRRRRRRRRRRRRS ## Finished Deployed Sucessfully: 2016-06-23 19:04:43 ## Finished: 2016-06-23 19:04:43
ADMIN TIP: If a deployment fails, go to the Azure Portal and look at Activity logs
and look for failed deployments - this should explain why the deployment failed.
You can use these functions to run and manage hive jobs on an HDInsight Cluster.
azureHiveStatus(sc, clusterName = "smrhdi", hdiAdmin = "hdiadmin", hdiPassword = "AzureSMR_password123") azureHiveSQL(sc, CMD = "select * from hivesampletable", path = "wasb://opendata@testmystorage1.blob.core.windows.net/")
AzureSMR
provides some functions that allow HDInsight Spark aessions and jobs to be managed within an R Session.
To create a new Spark session (via Livy) use azureSparkNewSession()
azureSparkNewSession(sc, clustername = "smrhdi", hdiAdmin = "hdiadmin", hdiPassword = "AzureSMR_password123", kind = "pyspark")
To view the status of sessions use azureSparkListSessions()
. Wait for status to be idle.
azureSparkListSessions(sc, clustername = "smrhdi")
To send a command to the Spark Session use azureSparkCMD()
. In this case it submits a Python routine. Ensure you preserve indents for Python.
# SAMPLE PYSPARK SCRIPT TO CALCULATE PI pythonCmd <- ' from pyspark import SparkContext from operator import add import sys from random import random partitions = 1 n = 20000000 * partitions def f(_): x = random() * 2 - 1 y = random() * 2 - 1 return 1 if x ** 2 + y ** 2 < 1 else 0 count = sc.parallelize(range(1, n + 1), partitions).map(f).reduce(add) Pi = (4.0 * count / n) print("Pi is roughly %f" % Pi)' azureSparkCMD(sc, CMD = pythonCmd, sessionID = "0") ## [1] "Pi is roughly 3.140285"
Check Session variables are retained
azureSparkCMD(sc, clustername = "smrhdi", CMD = "print Pi", sessionID = "0") #[1] "3.1422"
You can also run SparkR sessions
azureSparkNewSession(sc, clustername = "smrhdi", hdiAdmin = "hdiadmin", hdiPassword = "AzureSMR_password123", kind = "sparkr") azureSparkCMD(sc, clustername = "smrhdi", CMD = "HW<-'hello R'", sessionID = "2") azureSparkCMD(sc, clustername = "smrhdi", CMD = "cat(HW)", sessionID = "2")
azureActiveContext
To access Azure Data Lake Store you need to generate an access token using either "ClientCredential" (default) or "DeviceCode" AuthType use createAzureContext()
.
asc <- createAzureContext(tenantID = "{TID}", clientID = "{CID}", authKey= "{KEY}")
To create directories in Azure Data Lake Store use azureDataLakeMkdirs()
azureDataLakeMkdirs(asc, azureDataLakeAccount, "tempfolder")
To list items in Azure Data Lake Store use azureDataLakeListStatus()
azureDataLakeListStatus(asc, azureDataLakeAccount, "") azureDataLakeListStatus(asc, azureDataLakeAccount, "tempfolder")
To create a file and optionally write data to the new file in Azure Data Lake Store use azureDataLakeCreate()
azureDataLakeCreate(asc, azureDataLakeAccount, "tempfolder/tempfile00.txt", "755", FALSE, 4194304L, 3L, 268435456L, charToRaw("abcd"))
To append to a file in Azure Data Lake Store use azureDataLakeAppend()
azureDataLakeAppend(asc, azureDataLakeAccount, "tempfolder/tempfile00.txt", 4194304L, charToRaw("stuv"))
To read a file in Azure Data Lake Store use azureDataLakeRead()
azureDataLakeRead(asc, azureDataLakeAccount, "tempfolder/tempfile00.txt", length = 2L, bufferSize = 4194304L)
To delete item(s) in Azure Data Lake Store use azureDataLakeDelete()
azureDataLakeDelete(asc, azureDataLakeAccount, "tempfolder", TRUE)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.