Description Usage Arguments Value Contribution Note Author(s) See Also Examples
Capture information in substrings of text
that match named and
unnamed tokens of regular expressions and convert the result to a
data frame.
1 2 3 4 5 6 7 8 |
text |
The text to be parsed: a character vector where matches are
sought, or an object which can be coerced by |
pattern |
Perl-like regular expression. |
ignore.case |
if |
perl |
logical. Should Perl-compatible regexps be used? |
stringsAsFactors |
logical, passed to |
... |
Other arguments to be passed to |
A data frame with parsed information.
In this function ideas from this answer on github.com were used.
Call to function gregexpr
with parameter perl = TRUE
is used.
Author Vilmantas Gegzna, contributor MrFlick, as he provided ideas on github.com (see section Contribution).
More about regular expressions used in R: regex
Website handy for creating and testing Perl-like regular expressions
(library pcre, version 1, not 2)
https://regex101.com/r/dS3iP1/1#pcre
Functions gregexpr
,
regcapturedmatches
,
operator from package magrittr %>%
.
Other spMisc utilities:
bru()
,
clc()
,
clear()
,
fCap()
,
isFALSE()
,
list_AddRm()
,
make.filenames()
,
open_wd()
,
printDuration()
,
st01()
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 | text1 <- c("A_111 B_aaa",
"A_222 B_bbb",
"A_333 B_ccc",
"A_444 B_ddd",
"A_555 B_eee")
# Named tokens
pattern1_named_tokens <- 'A_(?<Part_A>.*) B_(?<Part_B>.*)'
regexp2df(text1, pattern1_named_tokens)
## Part_A Part_B
## 1 111 aaa
## 2 222 bbb
## 3 333 ccc
## 4 444 ddd
## 5 555 eee
# Unnamed tockens - groups inside brackets:
pattern1_unnamed_tokens <- 'A_(.*) B_(.*)'
regexp2df(text1, pattern1_unnamed_tokens)
## X X.1
## 1 111 aaa
## 2 222 bbb
## 3 333 ccc
## 4 444 ddd
## 5 555 eee
#----------------------------------------------------------
# Wrong. There must be NO SPACES in token's name:
## Not run:
pattern2 <- 'A (?<Part A>.*) B (?<Part B>.*)'
regexp2df(text1, pattern2)
## Error ...
## End(Not run)
#----------------------------------------------------------
text3 <- c("sn555 ID_O20-5-684_N52_2_Subt2_01.",
"sn555 ID_O20-5-984_S52_8_Subt10_11.")
pattern3 <- paste0('sn(?<serial_number>.*) ID_(?<ID>.*)_(?<Class>[NS])',
'(?<Sector>.*)_(?<Point>.*)_[Ss]ubt.*\\.');
regexp2df(text3, pattern3)
## serial_number ID Class Sector Point
## 1 555 O20-5-684 N 52 2
## 2 555 O20-5-984 S 52 8
#----------------------------------------------------------
# List all .R files in your working directory:
regexp2df(dir(),'(?<R_file>.*\\.[rR]$)')
# Do the same by using chaining operator %>%:
library(dplyr)
dir() %>% regexp2df('(?<R_file>\\.*[rR]$)')
#----------------------------------------------------------
# Capture several types of files:
expr <- paste0('(?<R_file>.*\\.[rR]$)|',
'(?<Rmd_file>.*\\.[rR]md$)|',
'(?<CSV_file>.*\\.[cC][sS][vV]$)')
dir() %>% regexp2df(expr)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.