# Build with devtools::install(build_vignettes = TRUE) library("knitr") NOT_CRAN <- identical(tolower(Sys.getenv("NOT_CRAN")), "true") knitr::opts_chunk$set( comment = "#>", collapse = TRUE, purl = NOT_CRAN, eval = NOT_CRAN )
Microsoft Cognitive Services -- formerly known as Project Oxford -- are a set of APIs, SDKs and services that developers can use to add AI features to their apps. Those features include emotion and video detection; facial, speech and vision recognition; and speech and language understanding.
The Web Language Model REST API provides tools for natural language processing NLP.
Per Microsoft's website, this API uses smoothed Backoff N-gram language models (supporting Markov order up to 5) that were trained on four web-scale American English corpora collected by Bing (web page body, title, anchor and query).
The MSCS Web LM REST API supports the following lookup operations:
You can either install the latest stable version from CRAN:
if ("mscsweblm4r" %in% installed.packages()[,"Package"] == FALSE) { install.packages("mscsweblm4r") }
Or, you can install the development version
if ("mscsweblm4r" %in% installed.packages()[,"Package"] == FALSE) { if ("devtools" %in% installed.packages()[,"Package"] == FALSE) { install.packages("devtools") } devtools::install_github("philferriere/mscsweblm4r") }
After loading {mscsweblm4r}
with library()
, you must call weblmInit()
before you can call any of the core {mscsweblm4r}
functions.
The weblmInit()
configuration function will first check to see if the variable
MSCS_WEBLANGUAGEMODEL_CONFIG_FILE
exists in the system environment. If it does,
the package will use that as the path to the configuration file.
If MSCS_WEBLANGUAGEMODEL_CONFIG_FILE
doesn't exist, it will look for the file
.mscskeys.json
in the current user's home directory (that's ~/.mscskeys.json
on Linux, and something like C:\Users\Phil\Documents\.mscskeys.json
on
Windows). If the file is found, the package will load the API key and URL from
it.
If using a file, please make sure it has the following structure:
{ "weblanguagemodelurl": "https://api.projectoxford.ai/text/weblm/v1.0/", "weblanguagemodelkey": "...MSCS Web Language Model API key goes here..." }
If no configuration file is found, weblmInit()
will attempt to pick up its
configuration from two Sys env variables instead:
MSCS_WEBLANGUAGEMODEL_URL
- the URL for the Web LM REST API.
MSCS_WEBLANGUAGEMODEL_KEY
- your personal Web LM REST API key.
weblmInit()
needs to be called only once, after package load.
The MSCS Web LM API is a RESTful API. HTTP requests over a network and the Internet can fail. Because of congestion, because the web site is down for maintenance, because of firewall configuration issues, etc. There are many possible points of failure.
The API can also fail if you've exhausted your call volume quota or are exceeding the API calls rate limit. Unfortunately, MSCS does not expose an API you can query to check if you're about to exceed your quota for instance. The only way you'll know for sure is by looking at the error code returned after an API call has failed.
Therefore, you must write your R code with failure in mind. Our preferred way is
to use tryCatch()
. Its mechanism may appear a bit daunting at first, but it
is well documented. We've
also included many examples, as you'll see below.
Here's some sample code that illustrates how to use tryCatch()
:
library('mscsweblm4r') tryCatch({ weblmInit() }, error = function(err) { geterrmessage() })
If {mscsweblm4r}
cannot locate .mscskeys.json
nor any of the configuration
environment variables, the code above will generate the following output:
[1] "mscsweblm4r: could not load config info from Sys env nor from file"
Similarly, weblmInit()
will fail if {mscsweblm4r}
cannot find the
weblanguagemodelkey
key in .mscskeys.json
, or fails to parse it correctly,
etc. This is why it is so important to use tryCatch()
with all {mscsweblm4r}
functions.
The five API calls exposed by {mscsweblm4r}
are the following:
# Retrieve a list of supported web language models weblmListAvailableModels()
# Break a string of concatenated words into individual words weblmBreakIntoWords( textToBreak, # ASCII only modelToUse = "body", # "title"|"anchor"|"query"(default)|"body" orderOfNgram = 5L, # 1L|2L|3L|4L|5L(default) maxNumOfCandidatesReturned = 5L # Default: 5L )
# Get the words most likely to follow a sequence of words weblmGenerateNextWords( precedingWords, # ASCII only modelToUse = "title", # "title"|"anchor"|"query"(default)|"body" orderOfNgram = 4L, # 1L|2L|3L|4L|5L(default) maxNumOfCandidatesReturned = 5L # Default: 5L )
# Calculate joint probability a particular sequence of words will appear together weblmCalculateJointProbability( inputWords =, # ASCII only modelToUse = "query", # "title"|"anchor"|"query"(default)|"body" orderOfNgram = 4L # 1L|2L|3L|4L|5L(default) )
# Calculate conditional probability a particular word will follow a given sequence of words weblmCalculateConditionalProbability( precedingWords, # ASCII only continuations, # ASCII only modelToUse = "title", # "title"|"anchor"|"query"(default)|"body" orderOfNgram = 4L # 1L|2L|3L|4L|5L(default) )
These functions return S3 class objects of the class weblm
. The weblm
object
exposes formatted results (in data.frame
format), the REST API JSON response
(should you care), and the HTTP request (mostly for debugging purposes).
The following code snippets illustrate how to use {mscsweblm4r} functions and show what results they return with toy examples. If after reviewing this code there is still confusion regarding how and when to use each function, please refer to the original documentation.
tryCatch({ # Retrieve a list of supported web language models weblmListAvailableModels() }, error = function(err) { # Print error geterrmessage() })
tryCatch({ # Break a sentence into words weblmBreakIntoWords( textToBreak = "testforwordbreak", # ASCII only modelToUse = "body", # "title"|"anchor"|"query"(default)|"body" orderOfNgram = 5L, # 1L|2L|3L|4L|5L(default) maxNumOfCandidatesReturned = 5L # Default: 5L ) }, error = function(err) { # Print error geterrmessage() })
tryCatch({ # Generate next words weblmGenerateNextWords( precedingWords = "how are you", # ASCII only modelToUse = "title", # "title"|"anchor"|"query"(default)|"body" orderOfNgram = 4L, # 1L|2L|3L|4L|5L(default) maxNumOfCandidatesReturned = 5L # Default: 5L ) }, error = function(err) { # Print error geterrmessage() })
tryCatch({ # Calculate joint probability a particular sequence of words will appear together weblmCalculateJointProbability( inputWords = c("where", "is", "San", "Francisco", "where is", "San Francisco", "where is San Francisco"), # ASCII only modelToUse = "query", # "title"|"anchor"|"query"(default)|"body" orderOfNgram = 4L # 1L|2L|3L|4L|5L(default) ) }, error = function(err) { # Print error geterrmessage() })
tryCatch({ # Calculate conditional probability a particular word will follow a given sequence of words weblmCalculateConditionalProbability( precedingWords = "hello world wide", # ASCII only continuations = c("web", "range", "open"), # ASCII only modelToUse = "title", # "title"|"anchor"|"query"(default)|"body" orderOfNgram = 4L # 1L|2L|3L|4L|5L(default) ) }, error = function(err) { # Print error geterrmessage() })
A test/demo Shiny web application is available here
{mscstexta4r}
, a R Client for the Microsoft Cognitive Services Text
Analytics REST API, is also available on CRAN
All Microsoft Cognitive Services components are Copyright © Microsoft.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.