Remove/Replace/Extract Strings Between 2 Markers

Share:

Description

Remove/replace/extract strings bounded between a left and right marker.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
rm_between(text.var, left, right, fixed = TRUE, trim = TRUE, clean = TRUE,
  replacement = "", extract = FALSE, include.markers = ifelse(extract,
  FALSE, TRUE), dictionary = getOption("regex.library"), ...)

rm_between_multiple(text.var, left, right, fixed = TRUE, trim = TRUE,
  clean = TRUE, replacement = "", extract = FALSE,
  include.markers = FALSE, merge = TRUE)

ex_between(text.var, left, right, fixed = TRUE, trim = TRUE, clean = TRUE,
  replacement = "", extract = TRUE, include.markers = ifelse(extract,
  FALSE, TRUE), dictionary = getOption("regex.library"), ...)

ex_between_multiple(text.var, left, right, fixed = TRUE, trim = TRUE,
  clean = TRUE, replacement = "", extract = TRUE,
  include.markers = FALSE, merge = TRUE)

Arguments

text.var

The text variable.

left

A vector of character or numeric symbols as the left edge to extract.

right

A vector of character or numeric symbols as the right edge to extract.

fixed

logical. If TRUE regular expression special characters (c(".", "|", "(", ")", "[", "]", "{", "}", "^", "$", "*", "+", "?")) will be treated as typical characters. If the user wants to pass a regular expression with special characters then fixed = FALSE should be used.

trim

logical. If TRUE removes leading and trailing white spaces.

clean

trim logical. If TRUE extra white spaces and escaped character will be removed.

replacement

Replacement for matched pattern.

extract

logical. If TRUE the strings are extracted into a list of vectors.

include.markers

logical. If TRUE and extract = TRUE returns the markers (left/right) and the text between.

dictionary

A dictionary of canned regular expressions to search within if pattern begins with "@rm_".

merge

logical. If TRUE the results of each bracket type will be merged by string. FALSE returns a named list of lists of vectors of markered text per marker type.

...

Other arguments passed to gsub.

Value

Returns a character string with markers removed. If rm_between returns merged strings and is significantly faster. If rm_between_multiple the strings are optionally merged by left/right symbols. The latter approach is more flexible and names extracted strings by symbol boundaries, however, it is slower than rm_between.

See Also

gsub, rm_bracket, stri_extract_all_regex

Other rm_.functions: as_numeric, as_numeric2, ex_number, rm_number; as_time, as_time2, ex_time, ex_transcript_time, rm_time, rm_transcript_time; ex_abbreviation, rm_abbreviation; ex_angle, ex_bracket, ex_bracket_multiple, ex_curly, ex_round, ex_square, rm_angle, rm_bracket, rm_bracket_multiple, rm_curly, rm_round, rm_square; ex_caps_phrase, rm_caps_phrase; ex_caps, rm_caps; ex_citation_tex, rm_citation_tex; ex_citation, rm_citation; ex_city_state_zip, rm_city_state_zip; ex_city_state, rm_city_state; ex_date, rm_date; ex_default, rm_default; ex_dollar, rm_dollar; ex_email, rm_email; ex_emoticon, rm_emoticon; ex_endmark, rm_endmark; ex_hash, rm_hash; ex_nchar_words, rm_nchar_words; ex_non_ascii, rm_non_ascii; ex_non_words, rm_non_words; ex_percent, rm_percent; ex_phone, rm_phone; ex_postal_code, rm_postal_code; ex_repeated_characters, rm_repeated_characters; ex_repeated_phrases, rm_repeated_phrases; ex_repeated_words, rm_repeated_words; ex_tag, rm_tag; ex_title_name, rm_title_name; ex_twitter_url, ex_url, rm_twitter_url, rm_url; ex_white, ex_white_bracket, ex_white_colon, ex_white_comma, ex_white_endmark, ex_white_lead, ex_white_lead_trail, ex_white_multiple, ex_white_punctuation, ex_white_trail, rm_white, rm_white_bracket, rm_white_colon, rm_white_comma, rm_white_endmark, rm_white_lead, rm_white_lead_trail, rm_white_multiple, rm_white_punctuation, rm_white_trail; ex_zip, rm_zip

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
x <-  "I like [bots] (not)."

rm_between(x, "(", ")")
ex_between(x, "(", ")")
rm_between(x, c("(", "["), c(")", "]"))
ex_between(x, c("(", "["), c(")", "]"))

rm_between(x, c("(", "["), c(")", "]"), include.markers=FALSE)
ex_between(x, c("(", "["), c(")", "]"), include.markers=TRUE)

## multiple (naming and ability to keep separate bracket types but slower)
x <- c("Where is the /big dog#?",
    "I think he's @arunning@b with /little cat#.")

rm_between_multiple(x, "@a", "@b")
ex_between_multiple(x, "@a", "@b")
rm_between_multiple(x, c("/", "@a"), c("#", "@b"))
ex_between_multiple(x, c("/", "@a"), c("#", "@b"))

x2 <- c("Where is the L1big dogL2?",
    "I think he's 98running99 with L1little catL2.")
rm_between_multiple(x2, c("L1", 98), c("L2", 99))
ex_between_multiple(x2, c("L1", 98), c("L2", 99))

state <- c("Computer is fun. Not too fun.", "No it's not, it's dumb.",
    "What should we do?", "You liar, it stinks!", "I am telling the truth!",
    "How can we be certain?", "There is no way.", "I distrust you.",
    "What are you talking about?", "Shall we move on?  Good then.",
    "I'm hungry.  Let's eat.  You already?")

rm_between_multiple(state, c("is", "we"), c("too", "on"))

## Use Grouping
s <- "something before stuff $some text$ in between $1$ and after"
rm_between(s, "$", "$", replacement="<B>\\2<E>")

## Using regular expressions as boundaries (fixed =FALSE)
x <-  c(
    "There are 2.3 million species in the world",
    "There are 2.3 billion species in the world"
)

ex_between(x, left='There', right = '[mb]illion', fixed = FALSE, include=TRUE)

Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.