Description Details Facilities available Author(s) References See Also
stringi is THE R package for very fast, correct, consistent, and convenient string/text manipulation in each locale and any character encoding. We are putting great effort to create software that works as you expect on any platform, in each locale, and any “native” system encoding.
Keywords: internationalization, localization, ICU, ICU4C, i18n, l10n, Unicode
Homepage: http://stringi.rexamine.com
License: The MIT license for the package code, the ICU license for accompanying ICU4C distribution, and the UCD license for the Unicode Character Database. See the COPYRIGHTS and LICENSE file for more details.
Manual pages on general topics (must-read):
stringi-encoding – character encoding issues, including information on encoding management in stringi, as well as on encoding detection, conversion, and Unicode normalization.
stringi-locale – locale issues, including
i.a. locale management and specification in stringi,
and the list of locale-sensitive operations. In
particular, see stri_opts_collator
for a
description of the string collation algorithm, which is
used for string comparing, ordering, sorting, casefolding,
and searching.
stringi-arguments – how stringi deals with its functions' arguments.
Refer to the following:
stringi-search for string searching facilities; these include pattern searching, matching, string splitting, and so on. The following independent search engines are provided:
stringi-search-regex – with ICU (Java-like) regular expressions;
stringi-search-fixed – Locale-aware or byte-exact fixed pattern searching;
stringi-search-charclass – for finding character classes, like “all whitespaces” or “all digits”.
stri_stats_general
and
stri_stats_latex
for gathering some
statistics on a character vector's contents.
stri_join
, stri_dup
,
and stri_flatten
for concatenation-based
operations.
stri_sub
for extracting and replacing
substrings, and stri_reverse
for a funny
function to reverse all characters in a string.
stri_trim
(among others) for trimming
characters from the beginning or/and end of a string, see
also stringi-search-charclass.
stri_length
(among others) for
determining the number of code points in a string.
stri_trans_tolower
(among others) for
case mapping, i.e. conversion to lower, UPPER, or Title
case.
stri_compare
,
stri_order
, and stri_sort
for
comparison-based, locale-aware operations, see also
stringi-locale.
stri_split_lines
(among others) to
split a string into text lines.
stri_escape_unicode
(among others)
for escaping certain code points.
DRAFT API: stri_read_raw
,
stri_read_lines
, and
stri_write_lines
for reading and writing
text files.
TO DO [these will appear in future versions of stringi]: pad, wrap, justify, HTML entities, character translation, MIME Base 64 encode/decode, random string generation, number and data/time formatting, and many more.
Note that each man page has many links to other interesting facilities.
Marek Gagolewski gagolews@rexamine.com,
Bartek
Tartanus bartektartanus@rexamine.com,
with some
contributions from Marcin Bujarski at the early stage of
package development. ICU4C was developed by IBM and others.
The Unicode Character Database is due to Unicode, Inc.
stringi Package homepage, http://stringi.rexamine.com
ICU – International Components for Unicode, http://www.icu-project.org/
ICU4C API Documentation, http://www.icu-project.org/apiref/icu4c/
The Unicode Consortium, http://www.unicode.org/
UTF-8, a transformation format of ISO 10646 – RFC 3629, http://tools.ietf.org/html/rfc3629
Other stringi_general_topics:
stringi-arguments
;
stringi-encoding
;
stringi-locale
;
stringi-search-charclass
;
stringi-search-fixed
;
stringi-search-regex
;
stringi-search
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.