about_search_boundaries: Text Boundary Analysis in 'stringi'

Description Details Author(s) References See Also


Text boundary analysis is the process of locating linguistic boundaries while formatting and handling text.


Examples of the boundary analysis process include:

Generally, text boundary analysis is a locale-dependent operation. For example, in Japanese and Chinese one does not separate words with spaces - a line break can occur even in the middle of a word. These languages have punctuation and diacritical marks that cannot start or end a line, so this must also be taken into account.

stringi uses ICU's BreakIterator to locate specific text boundaries. Note that the BreakIterator's behavior may be controlled in come cases, see stri_opts_brkiter.

For technical details on different classes of text boundaries refer to the ICU User Guide, see below.


Marek Gagolewski and other contributors


Boundary Analysis – ICU User Guide, https://unicode-org.github.io/icu/userguide/boundaryanalysis/

See Also

The official online manual of stringi at https://stringi.gagolewski.com/

Other locale_sensitive: %s<%(), about_locale, about_search_coll, stri_compare(), stri_count_boundaries(), stri_duplicated(), stri_enc_detect2(), stri_extract_all_boundaries(), stri_locate_all_boundaries(), stri_opts_collator(), stri_order(), stri_rank(), stri_sort_key(), stri_sort(), stri_split_boundaries(), stri_trans_tolower(), stri_unique(), stri_wrap()

Other text_boundaries: about_search, stri_count_boundaries(), stri_extract_all_boundaries(), stri_locate_all_boundaries(), stri_opts_brkiter(), stri_split_boundaries(), stri_split_lines(), stri_trans_tolower(), stri_wrap()

Other stringi_general_topics: about_arguments, about_encoding, about_locale, about_search_charclass, about_search_coll, about_search_fixed, about_search_regex, about_search, about_stringi

stringi documentation built on May 17, 2021, 9:07 a.m.