This notebook is adapted from John Snow Labs Jupyter/Python getting started notebook. See https://github.com/JohnSnowLabs/spark-nlp-workshop/blob/master/jupyter/quick_start.ipynb for that version.

Spark NLP Quick Start

How to use Spark NLP pretrained pipelines

Make sure you have already installed sparklyr and sparklnlp

library(dplyr)
library(sparklyr)
library(sparklyr.nested)
library(sparknlp)

Let's create a Spark Session for our app

version <- Sys.getenv("SPARK_VERSION", unset = "2.4.3")

config <- sparklyr::spark_config()

options(sparklyr.sanitize.column.names.verbose = TRUE)
options(sparklyr.verbose = TRUE)
options(sparklyr.na.omit.verbose = TRUE)
options(sparklyr.na.action.verbose = TRUE)
sc <- sparklyr::spark_connect(master = "local", version = version, config = config)

cat("Apache Spark version: ", sc$home_version)

Let's use Spark NLP pre-trained pipeline for named entity recognition

pipeline <- nlp_pretrained_pipeline(sc, "recognize_entities_dl", lang = "en")
result <- nlp_annotate(pipeline, "Google has announced the release of a beta version of the popular TensorFlow machine learning library.")
result)

result %>%
  mutate(entity_type = ner.result) %>%
  pull(entity_type) %>%
  unlist()
result %>%
  mutate(named_entities = entities.result) %>%
  pull(named_entities) %>% 
  unlist()

Let's use Spark NLP pre-trained pipeline for sentiment analysis

pipeline <- nlp_pretrained_pipeline(sc, "analyze_sentiment", "en")
result <- nlp_annotate(pipeline, "This is a very boring movie. I recommend others to awoid this movie is not good..")
result %>% 
  mutate(sentiments = sentiment.result) %>%
  pull(sentiments) %>% 
  unlist()
result %>% 
  mutate(checked_words = checked.result) %>%
  pull(checked_words) %>% 
  unlist()

The word awoid has been corrected to avoid by spell checker inside this pipeline



r-spark/sparknlp documentation built on Oct. 15, 2022, 10:50 a.m.