syntaxNetR: 'syntaxNetR' a basic R binding for Google SyntaxNet Parsey...

Description Usage Arguments Value

Description

Basic R binding to use an adapted version of the example python code provided by Google to tag R character vectors. You will need a Mac/Linux computer with an installation of syntaxNet to use this function. Follow the steps described here:https://github.com/tensorflow/models/tree/master/syntaxnet.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
syntaxNetR(tokenVector, syntaxNetPath = NULL,
  language = c("Ancient_Greek-PROIEL", "Ancient_Greek", "Arabic", "Basque",
  "Bulgarian", "Catalan", "Chinese", "Croatian", "Czech-CAC", "Czech-CLTT",
  "Czech", "Danish", "Dutch-LassySmall", "Dutch", "English-LinES", "English",
  "Estonian", "Finnish-FTB", "Finnish", "French", "Galician", "German",
  "Gothic", "Greek", "Hebrew", "Hindi", "Hungarian", "Indonesian", "Irish",
  "Italian", "Kazakh", "Latin-ITTB", "Latin-PROIEL", "Latin", "Latvian",
  "Norwegian", "Old_Church_Slavonic", "Persian", "Polish", "Portuguese-BR",     
  "Portuguese", "Romanian", "Russian-SynTagRus", "Russian", "Slovenian-SST",
  "Slovenian", "Spanish-AnCora", "Spanish", "Swedish-LinES", "Swedish", "Tamil",
  "Turkish"), SIMPLIFY = TRUE, verbose = TRUE)

Arguments

tokenVector

a character vector formatted in UTF-8 or ASCII. It will almost certainly not work with windows/ISO 8591-1/latin1 encoding.

syntaxNetPath

path to your local syntaxNet installation, with trailing slashes (i.e., the path has to end with a forward slash)

language

name of the language model provided by Google, must be one of pre-set options

SIMPLIFY

if set to TRUE (default), the function will simplify the list output to text with slashtags (e.g., _noun_sg_nsubj)

verbose

if set to TRUE (default) the function will output log messages allowing you to track its progress

Value

a parsed character vector (if SIMPLIFY is set to TRUE) or a list representation of the parsed character vector


jeroenclaes/tweetCorp documentation built on May 27, 2019, 4:50 a.m.