The googlenlp package provides an R interface to Google's Cloud Natural Language API.
"Google Cloud Natural Language API reveals the structure and meaning of text by offering powerful machine learning models in an easy to use REST API. You can use it to extract information about people, places, events and much more, mentioned in text documents, news articles or blog posts. You can use it to understand sentiment about your product on social media or parse intent from customer conversations happening in a call center or a messaging app." [source]
There are four main features of the API, all of which are available through this R package [source]:
The current googlenlp release can be installed from CRAN:
install.packages("googlenlp")
The newest development release can be installed from GitHub:
# install.packages('devtools')
devtools::install_github("BrianWeinstein/googlenlp")
To use the API, you'll first need to create a Google Cloud project and enable billing, and get an API key.
Load the package and set your API key. There are two ways to do this.
Method A (preferred method) adds your API key as a variable to your .Renviron
file. Under this method, you only need to do this setup process one time.
library(googlenlp)
configure_googlenlp() # follow the instructions printed to the console
googlenlp setup instructions:
1. Your ~/.Renviron file will now open in a new window/tab.
*** If it doesn't open, run: file.edit("~/.Renviron") ***
2. To use the API, you'll first need to create a Google Cloud project and enable billing (https://cloud.google.com/natural-language/docs/getting-started).
3. Next you'll need to get an API key (https://cloud.google.com/natural-language/docs/common/auth).
4. In your ~/.Renviron file, replace the ENTER_YOUR_API_KEY_HERE with your Google Cloud API key.
5. Save your ~/.Renviron file.
6. *** Restart your R session for changes to take effect. ***
Method B defines your API key as a session-level variable. Under this method, you'll need to set your API key at the beginning of each R session.
library(googlenlp)
set_api_key("MY_API_KEY") # replace this with your API key
Define the text you'd like to analyze.
text <- "Google, headquartered in Mountain View, unveiled the new Android phone at the Consumer Electronic Show.
Sundar Pichai said in his keynote that users love their new Android phones."
The annotate_text
function analyzes the text's syntax (sentences and tokens), entities, sentiment, and language; and returns the result as a five-element list.
analyzed <- annotate_text(text_body = text)
#> Warning: package 'bindrcpp' was built under R version 3.4.4
str(analyzed, max.level = 1)
#> List of 5
#> $ sentences :Classes 'tbl_df', 'tbl' and 'data.frame': 2 obs. of 4 variables:
#> $ tokens :Classes 'tbl_df', 'tbl' and 'data.frame': 32 obs. of 17 variables:
#> $ entities :Classes 'tbl_df', 'tbl' and 'data.frame': 10 obs. of 8 variables:
#> $ documentSentiment:'data.frame': 1 obs. of 2 variables:
#> $ language : chr "en"
"Sentence extraction breaks up the stream of text into a series of sentences." [API Documentation]
beginOffset
indicates the (zero-based) character index of where the sentence begins (wtih UTF-8 encoding).magnitude
and score
fields quantify each sentence's sentiment — see the Document Sentiment section for more details.analyzed$sentences
content
beginOffset
magnitude
score
Google, headquartered in Mountain View, unveiled the new Android phone at the Consumer Electronic Show.
0
0.0
0.0
Sundar Pichai said in his keynote that users love their new Android phones.
113
0.6
0.6
"Tokenization breaks the stream of text up into a series of tokens, with each token usually corresponding to a single word. The Natural Language API then processes the tokens and, using their locations within sentences, adds syntactic information to the tokens." [API Documentation]
lemma
indicates the token's "root" word, and can be useful in standardizing the word within the text.tag
indicates the token's part of speech.analyzed$tokens
content
beginOffset
lemma
tag
aspect
case
form
gender
mood
number
person
proper
reciprocity
tense
voice
dependencyEdge_headTokenIndex
dependencyEdge_label
Google
0
Google
NOUN
ASPECT_UNKNOWN
CASE_UNKNOWN
FORM_UNKNOWN
GENDER_UNKNOWN
MOOD_UNKNOWN
SINGULAR
PERSON_UNKNOWN
PROPER
RECIPROCITY_UNKNOWN
TENSE_UNKNOWN
VOICE_UNKNOWN
7
NSUBJ
,
6
,
PUNCT
ASPECT_UNKNOWN
CASE_UNKNOWN
FORM_UNKNOWN
GENDER_UNKNOWN
MOOD_UNKNOWN
NUMBER_UNKNOWN
PERSON_UNKNOWN
PROPER_UNKNOWN
RECIPROCITY_UNKNOWN
TENSE_UNKNOWN
VOICE_UNKNOWN
0
P
headquartered
8
headquarter
VERB
ASPECT_UNKNOWN
CASE_UNKNOWN
FORM_UNKNOWN
GENDER_UNKNOWN
MOOD_UNKNOWN
NUMBER_UNKNOWN
PERSON_UNKNOWN
PROPER_UNKNOWN
RECIPROCITY_UNKNOWN
PAST
VOICE_UNKNOWN
0
VMOD
in
22
in
ADP
ASPECT_UNKNOWN
CASE_UNKNOWN
FORM_UNKNOWN
GENDER_UNKNOWN
MOOD_UNKNOWN
NUMBER_UNKNOWN
PERSON_UNKNOWN
PROPER_UNKNOWN
RECIPROCITY_UNKNOWN
TENSE_UNKNOWN
VOICE_UNKNOWN
2
PREP
Mountain
25
Mountain
NOUN
ASPECT_UNKNOWN
CASE_UNKNOWN
FORM_UNKNOWN
GENDER_UNKNOWN
MOOD_UNKNOWN
SINGULAR
PERSON_UNKNOWN
PROPER
RECIPROCITY_UNKNOWN
TENSE_UNKNOWN
VOICE_UNKNOWN
5
NN
View
34
View
NOUN
ASPECT_UNKNOWN
CASE_UNKNOWN
FORM_UNKNOWN
GENDER_UNKNOWN
MOOD_UNKNOWN
SINGULAR
PERSON_UNKNOWN
PROPER
RECIPROCITY_UNKNOWN
TENSE_UNKNOWN
VOICE_UNKNOWN
3
POBJ
,
38
,
PUNCT
ASPECT_UNKNOWN
CASE_UNKNOWN
FORM_UNKNOWN
GENDER_UNKNOWN
MOOD_UNKNOWN
NUMBER_UNKNOWN
PERSON_UNKNOWN
PROPER_UNKNOWN
RECIPROCITY_UNKNOWN
TENSE_UNKNOWN
VOICE_UNKNOWN
0
P
unveiled
40
unveil
VERB
ASPECT_UNKNOWN
CASE_UNKNOWN
FORM_UNKNOWN
GENDER_UNKNOWN
INDICATIVE
NUMBER_UNKNOWN
PERSON_UNKNOWN
PROPER_UNKNOWN
RECIPROCITY_UNKNOWN
PAST
VOICE_UNKNOWN
7
ROOT
the
49
the
DET
ASPECT_UNKNOWN
CASE_UNKNOWN
FORM_UNKNOWN
GENDER_UNKNOWN
MOOD_UNKNOWN
NUMBER_UNKNOWN
PERSON_UNKNOWN
PROPER_UNKNOWN
RECIPROCITY_UNKNOWN
TENSE_UNKNOWN
VOICE_UNKNOWN
11
DET
new
53
new
ADJ
ASPECT_UNKNOWN
CASE_UNKNOWN
FORM_UNKNOWN
GENDER_UNKNOWN
MOOD_UNKNOWN
NUMBER_UNKNOWN
PERSON_UNKNOWN
PROPER_UNKNOWN
RECIPROCITY_UNKNOWN
TENSE_UNKNOWN
VOICE_UNKNOWN
11
AMOD
Android
57
Android
NOUN
ASPECT_UNKNOWN
CASE_UNKNOWN
FORM_UNKNOWN
GENDER_UNKNOWN
MOOD_UNKNOWN
SINGULAR
PERSON_UNKNOWN
PROPER
RECIPROCITY_UNKNOWN
TENSE_UNKNOWN
VOICE_UNKNOWN
11
NN
phone
65
phone
NOUN
ASPECT_UNKNOWN
CASE_UNKNOWN
FORM_UNKNOWN
GENDER_UNKNOWN
MOOD_UNKNOWN
SINGULAR
PERSON_UNKNOWN
PROPER_UNKNOWN
RECIPROCITY_UNKNOWN
TENSE_UNKNOWN
VOICE_UNKNOWN
7
DOBJ
at
71
at
ADP
ASPECT_UNKNOWN
CASE_UNKNOWN
FORM_UNKNOWN
GENDER_UNKNOWN
MOOD_UNKNOWN
NUMBER_UNKNOWN
PERSON_UNKNOWN
PROPER_UNKNOWN
RECIPROCITY_UNKNOWN
TENSE_UNKNOWN
VOICE_UNKNOWN
7
PREP
the
74
the
DET
ASPECT_UNKNOWN
CASE_UNKNOWN
FORM_UNKNOWN
GENDER_UNKNOWN
MOOD_UNKNOWN
NUMBER_UNKNOWN
PERSON_UNKNOWN
PROPER_UNKNOWN
RECIPROCITY_UNKNOWN
TENSE_UNKNOWN
VOICE_UNKNOWN
16
DET
Consumer
78
Consumer
NOUN
ASPECT_UNKNOWN
CASE_UNKNOWN
FORM_UNKNOWN
GENDER_UNKNOWN
MOOD_UNKNOWN
SINGULAR
PERSON_UNKNOWN
PROPER
RECIPROCITY_UNKNOWN
TENSE_UNKNOWN
VOICE_UNKNOWN
16
NN
Electronic
87
Electronic
NOUN
ASPECT_UNKNOWN
CASE_UNKNOWN
FORM_UNKNOWN
GENDER_UNKNOWN
MOOD_UNKNOWN
SINGULAR
PERSON_UNKNOWN
PROPER
RECIPROCITY_UNKNOWN
TENSE_UNKNOWN
VOICE_UNKNOWN
16
NN
Show
98
Show
NOUN
ASPECT_UNKNOWN
CASE_UNKNOWN
FORM_UNKNOWN
GENDER_UNKNOWN
MOOD_UNKNOWN
SINGULAR
PERSON_UNKNOWN
PROPER
RECIPROCITY_UNKNOWN
TENSE_UNKNOWN
VOICE_UNKNOWN
12
POBJ
.
102
.
PUNCT
ASPECT_UNKNOWN
CASE_UNKNOWN
FORM_UNKNOWN
GENDER_UNKNOWN
MOOD_UNKNOWN
NUMBER_UNKNOWN
PERSON_UNKNOWN
PROPER_UNKNOWN
RECIPROCITY_UNKNOWN
TENSE_UNKNOWN
VOICE_UNKNOWN
7
P
Sundar
113
Sundar
NOUN
ASPECT_UNKNOWN
CASE_UNKNOWN
FORM_UNKNOWN
GENDER_UNKNOWN
MOOD_UNKNOWN
SINGULAR
PERSON_UNKNOWN
PROPER
RECIPROCITY_UNKNOWN
TENSE_UNKNOWN
VOICE_UNKNOWN
19
NN
Pichai
120
Pichai
NOUN
ASPECT_UNKNOWN
CASE_UNKNOWN
FORM_UNKNOWN
GENDER_UNKNOWN
MOOD_UNKNOWN
SINGULAR
PERSON_UNKNOWN
PROPER
RECIPROCITY_UNKNOWN
TENSE_UNKNOWN
VOICE_UNKNOWN
20
NSUBJ
said
127
say
VERB
ASPECT_UNKNOWN
CASE_UNKNOWN
FORM_UNKNOWN
GENDER_UNKNOWN
INDICATIVE
NUMBER_UNKNOWN
PERSON_UNKNOWN
PROPER_UNKNOWN
RECIPROCITY_UNKNOWN
PAST
VOICE_UNKNOWN
20
ROOT
in
132
in
ADP
ASPECT_UNKNOWN
CASE_UNKNOWN
FORM_UNKNOWN
GENDER_UNKNOWN
MOOD_UNKNOWN
NUMBER_UNKNOWN
PERSON_UNKNOWN
PROPER_UNKNOWN
RECIPROCITY_UNKNOWN
TENSE_UNKNOWN
VOICE_UNKNOWN
20
PREP
his
135
his
PRON
ASPECT_UNKNOWN
GENITIVE
FORM_UNKNOWN
MASCULINE
MOOD_UNKNOWN
SINGULAR
THIRD
PROPER_UNKNOWN
RECIPROCITY_UNKNOWN
TENSE_UNKNOWN
VOICE_UNKNOWN
23
POSS
keynote
139
keynote
NOUN
ASPECT_UNKNOWN
CASE_UNKNOWN
FORM_UNKNOWN
GENDER_UNKNOWN
MOOD_UNKNOWN
SINGULAR
PERSON_UNKNOWN
PROPER_UNKNOWN
RECIPROCITY_UNKNOWN
TENSE_UNKNOWN
VOICE_UNKNOWN
21
POBJ
that
147
that
ADP
ASPECT_UNKNOWN
CASE_UNKNOWN
FORM_UNKNOWN
GENDER_UNKNOWN
MOOD_UNKNOWN
NUMBER_UNKNOWN
PERSON_UNKNOWN
PROPER_UNKNOWN
RECIPROCITY_UNKNOWN
TENSE_UNKNOWN
VOICE_UNKNOWN
26
MARK
users
152
user
NOUN
ASPECT_UNKNOWN
CASE_UNKNOWN
FORM_UNKNOWN
GENDER_UNKNOWN
MOOD_UNKNOWN
PLURAL
PERSON_UNKNOWN
PROPER_UNKNOWN
RECIPROCITY_UNKNOWN
TENSE_UNKNOWN
VOICE_UNKNOWN
26
NSUBJ
love
158
love
VERB
ASPECT_UNKNOWN
CASE_UNKNOWN
FORM_UNKNOWN
GENDER_UNKNOWN
INDICATIVE
NUMBER_UNKNOWN
PERSON_UNKNOWN
PROPER_UNKNOWN
RECIPROCITY_UNKNOWN
PRESENT
VOICE_UNKNOWN
20
CCOMP
their
163
their
PRON
ASPECT_UNKNOWN
GENITIVE
FORM_UNKNOWN
GENDER_UNKNOWN
MOOD_UNKNOWN
PLURAL
THIRD
PROPER_UNKNOWN
RECIPROCITY_UNKNOWN
TENSE_UNKNOWN
VOICE_UNKNOWN
30
POSS
new
169
new
ADJ
ASPECT_UNKNOWN
CASE_UNKNOWN
FORM_UNKNOWN
GENDER_UNKNOWN
MOOD_UNKNOWN
NUMBER_UNKNOWN
PERSON_UNKNOWN
PROPER_UNKNOWN
RECIPROCITY_UNKNOWN
TENSE_UNKNOWN
VOICE_UNKNOWN
30
AMOD
Android
173
Android
NOUN
ASPECT_UNKNOWN
CASE_UNKNOWN
FORM_UNKNOWN
GENDER_UNKNOWN
MOOD_UNKNOWN
SINGULAR
PERSON_UNKNOWN
PROPER
RECIPROCITY_UNKNOWN
TENSE_UNKNOWN
VOICE_UNKNOWN
30
NN
phones
181
phone
NOUN
ASPECT_UNKNOWN
CASE_UNKNOWN
FORM_UNKNOWN
GENDER_UNKNOWN
MOOD_UNKNOWN
PLURAL
PERSON_UNKNOWN
PROPER_UNKNOWN
RECIPROCITY_UNKNOWN
TENSE_UNKNOWN
VOICE_UNKNOWN
26
DOBJ
.
187
.
PUNCT
ASPECT_UNKNOWN
CASE_UNKNOWN
FORM_UNKNOWN
GENDER_UNKNOWN
MOOD_UNKNOWN
NUMBER_UNKNOWN
PERSON_UNKNOWN
PROPER_UNKNOWN
RECIPROCITY_UNKNOWN
TENSE_UNKNOWN
VOICE_UNKNOWN
20
P
"Entity Analysis provides information about entities in the text, which generally refer to named 'things' such as famous individuals, landmarks, common objects, etc... A good general practice to follow is that if something is a noun, it qualifies as an 'entity.'" [API Documentation]
entity_type
indicates the type of entity (i.e., it classifies the entity as a person, location, consumer good, etc.).mid
provides a "machine-generated identifier" correspoding to the entity's Google Knowledge Graph entry.wikipedia_url
provides the entity's Wikipedia URL.salience
indicates the entity's importance to the entire text. Scores range from 0.0 (less important) to 1.0 (highly important).analyzed$entities
name
entity_type
mid
wikipedia_url
salience
content
beginOffset
mentions_type
Google
ORGANIZATION
/m/045c7b
https://en.wikipedia.org/wiki/Google
0.2557206
Google
0
PROPER
users
PERSON
NA
NA
0.1527633
users
152
COMMON
phone
CONSUMER_GOOD
NA
NA
0.1311989
phone
65
COMMON
Android
CONSUMER_GOOD
/m/02wxtgw
https://en.wikipedia.org/wiki/Android_(operating_system)
0.1224526
Android
57
PROPER
Android
CONSUMER_GOOD
/m/02wxtgw
https://en.wikipedia.org/wiki/Android_(operating_system)
0.1224526
Android
173
PROPER
Sundar Pichai
PERSON
/m/09gds74
https://en.wikipedia.org/wiki/Sundar_Pichai
0.1141411
Sundar Pichai
113
PROPER
Mountain View
LOCATION
/m/0r6c4
https://en.wikipedia.org/wiki/Mountain_View,_California
0.1019596
Mountain View
25
PROPER
Consumer Electronic Show
EVENT
/m/01p15w
https://en.wikipedia.org/wiki/Consumer_Electronics_Show
0.0703438
Consumer Electronic Show
78
PROPER
phones
CONSUMER_GOOD
NA
NA
0.0338317
phones
181
COMMON
keynote
OTHER
NA
NA
0.0175884
keynote
139
COMMON
"Sentiment analysis attempts to determine the overall attitude (positive or negative) expressed within the text. Sentiment is represented by numerical score
and magnitude
values." [API Documentation]
score
ranges from -1.0 (negative) to 1.0 (positive), and indicates to the "overall emotional leaning of the text".magnitude
"indicates the overall strength of emotion (both positive and negative) within the given text, between 0.0 and +inf. Unlike score, magnitude is not normalized; each expression of emotion within the text (both positive and negative) contributes to the text's magnitude (so longer text blocks may have greater magnitudes)."A note on how to interpret these sentiment values is posted here.
analyzed$documentSentiment
| magnitude| score| |----------:|------:| | 0.6| 0.3|
language
indicates the detected language of the document. Only English ("en"), Spanish ("es") and Japanese ("ja") are currently supported by the API.
analyzed$language
#> [1] "en"
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.