Description Locale-Aware String Search Engine Byte Compare General Notes References See Also
String searching facilities described in this very man page provide a way to detect and extract a specific piece of text. Note that locale-sensitive searching , especially on a non-English language text, is a much more complex process than one may think at the first glance.
By default, all stri_*_fixed
functions in
stringi utilize ICU's StringSearch
engine – which is a language-aware string search
algorithm. Note that a bitwise match will not give
correct results in cases of:
accented letters;
conjoined letters;
ignorable punctuation;
ignorable case.
The matches are defined using the notion of “canonical equivalence” between strings.
This string search engines uses a modified version of the
Boyer-Moore algorithm (cf. Werner, 1999), with time
complexity of O(n+p) (n == length(str)
, p ==
length(pattern)
). According to the ICU User Guide, the
Boyer-Moore searching algorithm is based on automata or
combinatorial properties of strings and pre-processes the
pattern and known to be much faster than the linear
search when search pattern length is longer. The
Boyer-Moore search is faster than the linear search when
the pattern text is longer than 3 or 4 characters.
Tuning the Collator's parameters allows you to perform correct matching that properly takes into account accented letters, conjoined letters, and ignorable punctuation and letter case.
For more information on ICU's Collator and
SearchEngine and how to tune it up in stringi,
refer to stri_opts_collator
.
If opts_collator
is NA
, then a very fast
(for small p) bitwise (locale independent) search is
performed, with time complexity of O(n*p) (n ==
length(str)
, p == length(pattern)
) [Naive
implementation - to be upgraded in some future version of
stringi]. For a natural language, non-English text
this is, however, not what you probably want.
You should note that, however, the conversion of input data to Unicode is done as usual.
In all the functions, if a given fixed search
pattern
is empty, then the result is NA
and
a warning is generated.
ICU String Search Service – ICU User Guide, http://userguide.icu-project.org/collation/icu-string-search-service
L. Werner, Efficient Text Searching in Java, 1999, http://icu-project.org/docs/papers/efficient_text_searching_in_java.html
Other locale_sensitive: stri_cmp
,
stri_compare
; stri_count_fixed
;
stri_detect_fixed
;
stri_enc_detect2
;
stri_locate_all_fixed
,
stri_locate_all_fixed,
,
stri_locate_first_fixed
,
stri_locate_first_fixed,
,
stri_locate_last_fixed
,
stri_locate_last_fixed
;
stri_opts_collator
; stri_order
,
stri_sort
;
stri_replace_all_fixed
,
stri_replace_all_fixed
,
stri_replace_first_fixed
,
stri_replace_first_fixed
,
stri_replace_last_fixed
,
stri_replace_last_fixed
;
stri_split_fixed
,
stri_split_fixed
;
stri_trans_tolower
,
stri_trans_totitle
,
stri_trans_toupper
;
stringi-locale
Other search_fixed: stri_count_fixed
;
stri_detect_fixed
;
stri_extract_all_fixed
,
stri_extract_all_fixed,
,
stri_extract_first_fixed
,
stri_extract_first_fixed,
,
stri_extract_last_fixed
,
stri_extract_last_fixed
;
stri_locate_all_fixed
,
stri_locate_all_fixed,
,
stri_locate_first_fixed
,
stri_locate_first_fixed,
,
stri_locate_last_fixed
,
stri_locate_last_fixed
;
stri_opts_collator
;
stri_replace_all_fixed
,
stri_replace_all_fixed
,
stri_replace_first_fixed
,
stri_replace_first_fixed
,
stri_replace_last_fixed
,
stri_replace_last_fixed
;
stri_split_fixed
,
stri_split_fixed
;
stringi-search
Other stringi_general_topics:
stringi-arguments
;
stringi-encoding
;
stringi-locale
;
stringi-package
;
stringi-search-charclass
;
stringi-search-regex
;
stringi-search
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.