This notebook is adapted from John Snow Labs Jupyter/Python getting started notebook. See https://github.com/JohnSnowLabs/spark-nlp-workshop/blob/master/jupyter/annotation/english/match-pattern-pipeline/Pretrained-MatchPattern-Pipeline.ipynb for that version.

library(sparklyr)
library(sparknlp)
library(dplyr)

Let's create a Spark connection for our app

version <- Sys.getenv("SPARK_VERSION", unset = "2.4.0")

config <- sparklyr::spark_config()

options(sparklyr.sanitize.column.names.verbose = TRUE)
options(sparklyr.verbose = TRUE)
options(sparklyr.na.omit.verbose = TRUE)
options(sparklyr.na.action.verbose = TRUE)
sc <- sparklyr::spark_connect(master = "local", version = version, config = config)

This Pipeline can extract phone numbers in these formats:

0689912549
+33698912549
+33 6 79 91 25 49
+33-6-79-91-25-49
(555)-555-5555
555-555-5555
+1-238 6 79 91 25 49
+1-555-532-3455
+15555323455
+7 06 79 91 25 49

pipeline <- nlp_pretrained_pipeline(sc, "match_pattern", lang = "en")
result <- nlp_annotate(pipeline, "You should call Mr. Jon Doe at +33 1 79 01 22 89")
pull(result, regex)[[1]][[1]][[4]]
result <- nlp_annotate(pipeline, "Ring me up dude! +1-334-179-1466")
pull(result, regex)[[1]][[1]][[4]]


r-spark/sparknlp documentation built on Oct. 15, 2022, 10:50 a.m.