Nothing
cwb_huffcode()
and cwb_compress_rdx()
did not delete redundant files on
Windows. Fixed by temporarily unloading the corpus #89.cwb_encode()
failed if argument s_attributes
was empty list. Fixed, the
default value of s_attributes
is now list()
#90.cwb_makeall()
will not reset CORPUS_REGISTY environment variable implicitly
if corpus to process has already been loaded #92.cwb_makeall()
, cwb_huffcode()
and cwb_compress_rdx()
have
new argument logfile
to redirect output to this file. Requires argument
quietly
to be TRUE
#65.cl_struc_values()
does not duplicate registry directories any more #77. get_region_matrix()
reports NA values for negative strucs #87.region_matrix_to_struc_matrix()
returns NA values for regions without
nested region as declared in the documentation #88.check_strucs()
issues warning if negative values are passed and if length of
input vector is 0.ranges_to_cpos()
drops rows from input matrix with NA values and issues
a respective warning.cwb_encode()
, cwb_makeall()
, cwb_huffcode()
and cwb_compress_rdx()
perform tilde expansion on filename provided by argument registry
, avoiding
a crash #84.region_to_strucs()
to get minimumum and maximum struc of
s-attribute within region provided. Works also for nested s-attributes.region_matrix_to_struc_matrix()
.cl_cpos2lbound()
and cl_cpos2rbound()
return NA if corpus
position is outside stru for given s-attribute. #78.cl_cpos2lbound()
and cl_cpos2rbound()
are exposed directly from
C++ without R wrappers, improving performance. Using the environment variable
'CORPUS_REGISTRY' if argument registry
is handled implicitly now.Rcpp::sourceCpp()
or
Rcpp::cppFunction()
.devtools::install_github("PolMine/RcppCWB")
. The missing ref = "dev"
has been inserted.cwb_encode()
crashed if arguments data_dir
and vrt_dir
include a tilde.
Tilde expansion is now applied to these arguments to avoid this #73.sprintf()
with snprintf()
to address security issue.sprintf()
corpus_properties()
and corpus_property()
do not crash any more, if corpus
is not loaded or not present #69.p_attr_default()
to programmatically extract default
p-attribute #63.region_matrix_corpus()
C++ code that would not show any
context at all if s_attribute expansion transgressed start or end of corpus.region_matrix_corpus()
C++ code that would result from
not considering that query matches may go cover more than one strucs of a
structural attribute.corpus_info_file()
does not crash if INFO is not defined in the registry
file (#62).sAttribute
and pAttribute
as s_attribute
or p_attribute
respectively is now accompanied by a warning that arguments
are deprectated.check_corpus()
function distinguishes between whether a corpus is loaded
in the CL and/or CQP context.cwb_huffcode()
and cwb_compress_rdx()
have argument delete
to trigger
deleting redundant files after compression (#60).cqp_load_corpus
will internally upper corpus ID as required in the CQP
context (#64).corpus_data_dir()
dir not work as intended without
explicitly setting the registry
argument. Fixed.corpus_info_file()
, corpus_full_name()
,
corpus_p_attributes()
, corpus_s_attributes()
, corpus_properties()
and
corpus_property()
to retrieve registry file data.corpus_registry_dir()
.cwb_charsets()
reports the charsets supported by CWB.cl_load_corpus()
and cqp_load_corpus()
do what the functions
suggests.cl_list_corpora()
complements existing function
cqp_list_corpora()
for the CL context.skip_blank_lines
, strip_whitespace
and xml
of
cwb_encode()
open configuration options of cwb_encode()
, overcoming the
previously hard-coded equivalent to the command-line option "-xsB".(#38).cpos_to_id()
, .cl_find_corpus()
and
.cl_new_attribute()
are an entry to passing around pointers, rather than
re-creating objects whenever switching from R to C..s_attr()
and .p_attr()
return pointers for a s- or
p-attribute.cl_*
are now available with pointer as input (e.g. cpos_to_id()
).cqp_drop_subcorpus()
function that has been disabled temporarily is
usable again (#34).cqp_query()
is now able to process subcorpora.RcppCWB:::.cqp_subcropus()
will construct a subcorpus from a region matrix.check_corpus()
does not re-set the registry directory and more, but tries
to load the checked corpus if it has not yet been loaded.s_attr_relationship()
will detect whether two s-attributes are
siblings, or in a descendent or ancestor relationship.cwb_encode()
, cwb_huffcode()
, cwb_makeall()
and
cwb_compress_rdx()
now have an argument quietly
to control display of output
messages. cwb_encode()
has an argument verbose
to control whether counter on
the number of tokens processed is dislpayed.cwb_encode()
to digest variations of path statements between
macOS and Windows are addressed using a reliable normalization of paths with
fs::path()
(#48).encoding
is checked for the validity of the encoding passed in
(#34).check_cpos()
issues a warning if argument cpos
is NULL
(#21).cl_cpos2id()
, cl_cpos2lbound()
, cl_cpos2rbound()
,
cl_cpos2str()
and cl_cpo2struc()
will return an empty, zero-length integer
vector if argument cpos
is NULL
(#21).check_corpus()
(used internally by many functions)
resulted from slightly differing representations of otherwise identical
paths. Using fs::path()
for path for normalization internally will omit
misleading warning messages.cqp_get_registry()
will now return a fs::path
object, as a safeguard for
a consistent normalization of paths.cl_delete_corpus()
will now (visibly) return a logial
value.cqp_load_corpus()
will return FALSE
if corpus has not been loaded
successfully.wrappers.cpp
into cl.cpp
, cqp.cpp
and utils.cpp
, so that
the code is organized more coherently corresponding to the different logics.check_cqp_query()
renamed to check_query()
to avoid a conflict
with a function defined in the polmineR package.cqp_list_subcorpora()
returns a character
vector. Previously, we just had
obscure printed messages.s_attribute_decode()
will not break if s-attribute has no values (#54).cl_struc2str()
and cl_struc2cpos()
may now include negative
values, the vectors returned will have NA
values at respective positions. The
check against negative values in check_strucs
is dropped accordingly.cwb_encode()
function did not declare structural attributes in the
registry and mistakenly channeled output for the file to the terminal (#49).
Fixed.cwb_encode()
did not reset global variables, which resulted in a
set of errors. Solved. (#51)cwb-huffcode.c
, cwb-compress-rdx.c
and cwb-makeall.c
was not in line with the CWB version of the rest of the code (v3.4.14 / SVN
revision 1069) but rather v2.2.b99 or v3.0.0. All code changes up to v3.4.14 were
reconstructed and implemented (#35). Note that cwb-encode.c
was at CWB v3.4.14,
as the encoding functionality was exposed at a later stage.cwb_version()
will report the version of the CWB source code.cwb_encode()
function now has a previously missing argument encoding
to state the encoding of the corpus to be indexed.cwb_encode()
now assumes implicitly that input files
are XML files and remove blank lines and leading and trailing whitespace. This
is equivalent to the option "-xsB" of the command line utility cwb-encode
.cwb_encode()
is now a patch of the main()
function of
cwb-encode.c
, so that code in the *.cpp file can be limited to a slim wrapper,
limiting the risk that the code in RcppCWB looses touch with CWB upstream
development._eval.h
, _globalvars.h
and _cl.h
in the ./src
directory
are autogenerated files now, not to be edited by hand.cqp_drop_subcorpus()
function is temporarily disabled to
ensure that the package can be built (#34).check_corpus()
that would trigger resetting the registry unintendendly and potentially falsely.use_tmp_dir()
, normalizePath()
is applied on the tempdir()
result to avoid confusion with symbolic links on macOS.cwb_encode()
(not yet run on Windows).cqp_get_registry()
that would sometimes result in a wrong return value (i.e. registry path) has been fixed (#14).cwb_makeall()
, an internal check is performed whether the corpus has been loaded already and whether the home directory of the loaded corpus and defined in the registry file are identical (#31).cl_delete_corpus()
function crashed when trying to delete a corpus that has not been loaded (#33). The function now aborts gracefully returning 0 when trying to delete a corpus that has not been loaded.corpus_is_loaded()
can be used to check whether a corpus is loaded.cwb_encode()
that exposes functionality of cwb-encode CWB utility.cl_cpos2lbound()
and cl_cpos2rbound()
will now accept an integer vector with length > 1 as argument cpos
and return a vector with the same length. Useful to speed up iterated queries for left and right boundaries of regions (#19).cl_struc_values()
exposes the corresponding C function of the Corpus Library (CL). The previous implicit assumption that all structural attributes have values can thus be tested. Intended to work with annotations of sentences and paragraphs, i.e. common structural attributes that do usually not have values.corpus_data_dir()
will derive the data directory from the internal C representation of a corpus.s_attr_regions()
will derive regions defined by a structural attribute from the *.rng file. Fastest option for large corpora.s_attr_is_sibling()
and s_attr_is_descendent()
test the sibling/descendent relationship of structural attributes.check_corpus()
now includes checks whether the registry provided (argument registry
) is identical with the registry defined internally by CQP. The registry is reset if directories are not identical.s_attribute_decode()
method was incomplete for method "Rcpp". This alternative to the "pure R" approach is now implemented (#2).method
previously setting "wininet" in ./tools/winlibs.R is omitted to avoid the warning "the 'wininet' method is deprecated for http:// and https:// URLs" on Windows.pcre-config
to locate header files of PCRE.cqp_initialize()
)get_tmp_registry()
will return the whereabouts of this directory.check_corpus()
-function. Problems with
the previous implementation that relied on files in the registry directory to
ensure the presence of a corpus hopefully do not occur.cl_charset_name()
is exposed, it will return the charset of a
corpus. Faster than parsing the registry file again and again.cl_delete_corpus()
-function can remove loaded corpora from memory.Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.