curl::curl_download()
replaces download.file()
to get CoreNLP.corenlp_install()
function has a new argument verbose
.StanfordCoreNLP
class whether sufficient heap space has been allocated would fail when a numeric value was returned (such as "455.5 Gb"). Fixed.segment()
function are padded with leading zeros (#18).purge()
function to preprocess an input string is not exposed by the corenlp_annotate()
method for data.table
. Yet the default approach of the StanfordCoreNLP$annotate()
worker to call purge()
, leading to results that may differ from results of StanfordCoreNLP$process_files()
(#20). To harmonize the two approaches, corenlp_annotate()
for data.table
objects now has the argument purge
that will be passed on.bignlp.properties_file
is not set upon loading the packqge any more and it is not used as the default value of argument corenlp_dir
of corenlp_annotate()
any more. An implicit setting of properties contradicts the logic of the (new) bignlp package that requires an explicit and conscious handling of properties.$initialize()
method of the StanfordCoreNLP
class will assign the value of the argument output_format
to the properties object. It is not necessary to set the output format seperately for the properties (#22).corenlp_parse_conll()
function now also accepts a list of character
vectors as input. If x
is a list, it will be unlisted.$annotate()
method of the StanfordCoreNLP
class has been renamed as $process()
method to reflect that the Java method called is process
. This avoids confusion with the (Java) method annotate
that also exists, and is a basis for turning StanfordCoreNLP
into the superclass of the AnnotationPipeline
class at a certain stage.corenlp_parse_json()
function will now add column names in line with the documentation of the CoNLLOutputter
class (#23).corenlp_parse_json()
function will now assign a column with the document id as column 'doc' to the data.table
that is prepared.corenlp_install()
set the option "bignlp.corenlp_dir" to a directory within the
package, not to the location designated by argument loc
. Fixed.StanfordCoreNLP
class now inherits from the AnnotationPipeline
class, exposing the $annotate()
method for parallel processing.AnnotationList
class is introduced to manage annotation objects.$annotate()
method of the AnnotationPipeline
class will return an AnnotationList
, the $as.matrix()
method of this class has been removed; its functionality is assumed by the $as.data.table()
method of the AnnotationList
class.as.Annotation()
turns tabular data for tokenized text into an Annotation object (#17).Fist release with Java parallelization.
Properties
object, including properties()
, parse_properties_file()
.mince()
for new workflow for parallel processing.$process_files()
of AnnotatorCoreNLP
class.$verbose()
of AnnotatorCoreNLP
class.rJava::.jpackage()
.corenlp_annotate()
recognizes whether chunk data is wrapped into quotes, and removes quotes if necessary.corenlp_annotate
has been removed (by wrapping checks whether files already exist into any
).chunk_table_get_nrow
-functionchunk_table_split
-functioncorenlp_get_jar_dir()
and `corenlp_get_properties_file() addedcorenlp_annotate
and corenlp_parse_ndjson
will now return the target files, which may be
helpful when using the functions in a pipe.Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.