knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
,;:@#$%
etc.)nchar
tolower
and toupper
replace
v <- names(precip)[3:8] v replace(v, list = 1:3, c('replacement 1', 'replacement 2'))
\newline
\
\"
"
, '
is treated literally, and vice versa\newline
\t
and a line break by \n
\\
\newline
paste
collapse
argument allows paste
to combine a character vector into one stringpaste('some', 'text') paste(c('some', 'more'), 'text') paste(c('some', 'text'), collapse = '_')
paste0
is a fast wrapper with no separatorsubstr
function can isolate a range of a string, beginning and ending at specified positionssubstr('some text', start = 3, stop = 8)
\newline
strsplit
split
argument specifies the character(s) that the separation will be done onstrsplit('even_more_text', split = '_')
x
strsplit(v, split = ' ')
text <- "In the Lenin Barracks in Barcelona, the day before I joined the militia, I saw an Italian militiaman standing in front of the officers' table. He was a tough-looking youth of twenty-five or six, with reddish-yellow hair and powerful shoulders. His peaked leather cap was pulled fiercely over one eye. He was standing in profile to me, his chin on his breast, gazing with a puzzled frown at a map which one of the officers had open on the table. Something in his face deeply moved me. It was the face of a man who would commit murder and throw away his life for a friend--the kind efface you would expect in an Anarchist, though as likely as not he was a Communist. There were both candour and ferocity in it; also the pathetic reverence that illiterate people have for their supposed superiors. Obviously he could not make head or tail of the map; obviously he regarded map-reading as a stupendous intellectual feat. I hardly know why, but I have seldom seen anyone--any man, I mean--to whom I have taken such an immediate liking. While they were talking round the table some remark brought it out that I was a foreigner." # split text on period followed by any character sentences <- strsplit(text, split = '\\..')[[1]] sentences sapply(strsplit(sentences, ' '), length)
fixed
is TRUE
, split
is interpreted as a regular expression (see below)startsWith
and endsWith
check for matches at the edges of stringsvv <- c('word 1', 'word 2', 'word3', 'Word 4', 'Words', 'word') startsWith(vv, 'word') endsWith(vv, 's')
grep
family of functions provedes very useful funcionalities# grep returns indices of matching vector elements grep('word', vv) # it can also return the matching items grep('word', vv, value = TRUE) # or the non-matching ones grep('word', vv, value = TRUE, invert = TRUE) # character case can be ignored grep('word', vv, value = TRUE, ignore.case = TRUE) # grepl returns a logical vector based on the presence of absence of a match grepl('word', vv) # both functinos can ignore case grepl('word', vv, ignore.case = TRUE)
The first argument fo grep
and grepl
specifies the pattern that is matched to elements of a caracter vector. Normally every character is matched exactly but it is possible to loosen the matching process. Regular expressions (often abbreviated as regex) are a way of ambiguating a the search pattern so that non-exact matches are identified. Matches can occur on alternative characters, repeated characters and more.
The ambiguation is achieved by using wildcards, which are put into the pattern as metacharacters. The most often used wildcards are:
.
matches any character[]
matches any character within the brackets[abcde]
matches a, b, c, d or e[a-e]
also matches a, b, c, d or e[a-z]
matches any lower case letter[4-9]
matches any digit from 4 to 9[49]
matches either a 4 or a 9[0-9A-Z]
matches any digit or any upper case letter[^]
mathes any character other than the ones within the brackets following a ^
^
stands for the beginning of the string$
stands for the end of the string*
the previous character is repeated any number of times+
the previous character is repeated at least 1 time?
the previous character is repeated 0 or 1 time{2,4}
the previous character is repeated at least 2 and at most 4 times|
separates alternative expressions and allows a match of any()
parenetheses delimit subexpressions, which is used for backreference (see below)There are other ways of specifying character classes, e.g.:
\w
any letter, digit, or underscore\d
any digit\s
any whitespace characterWhenever a wildcard character is to be taken literally, it must be escaped. Importantly, in R the backslash alone is a literal character so it must be escaped to turn int into the escape character. And so \.
means "backslash-period" but \\.
means "any character".
Some examples:
^A
matches anything beginning with A, including a single A^A.
a string beginning with A followed by any character^A.$
matches a string containing A followed by any one character and nothing elseA*$
mathces a string ending with any number of As\newline
matches any string containing a spaceu
matches a space followed by a uT?u{1,3}$
matches a space followed by an optional T, followed by one to three us but only at the end of a string\newline
grey
matches only greygrey|gray
matches both gray and grey (US vs UK spelling)gr[ae]y
is equivalent to the above - does not match graey![Ee]arth
matches Earth and earth[Ee]arths?
also matches Earths and earths\newline
^[A-Z].*\\.$
matches a string that begins with a capital letter, ends with a period and contains any number of any characters in between - a typical declarative sentence
^[A-Z].*, .*\\.$
matches a sentence that contains a comma followed by a space anywhere within, which indicates a compond sentence
^[A-Z].*, .*[.?]
the sentence can now also be a question; note that characters within brackets are taken literally and need not be escaped
\newline
More examples:
vv grep('^w', vv, value = TRUE) grep('^[wW]', vv, value = TRUE) grep('^Ws$', vv, value = TRUE) grep('^W.s$', vv, value = TRUE) grep('^W.+s$', vv, value = TRUE) grep('^W.+s?$', vv, value = TRUE) grep(' \\d$', vv, value = T) grep(' *\\d$', vv, value = T)
The sub
function searches a string for matches to a regular expression just like grep
does. However, rather than reporting matches, it replaces them with a replacement string. The replacement is a literal string, not a regex.
days <- c('Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday') grep('day', days, value = TRUE) sub('day', 'DAY', days)
sub
only acts on the first occurrence of the pattern. To replace all accurrences use gsub
.
sub('e', '.E.', 'Wednesday') gsub('e', '.E.', 'Wednesday')
Characters in a regular expression can be grouped using parentheses: (Wednes)(day)
. The ability to recall these groups individually is called backtracing. Backtracing allows for preserving a partial match during substitution, whereas normally the entire match is replaced.
sub('day', 'DAY', 'Wednesday') sub('.*day', 'DAY', 'Wednesday') sub('(.*)(day)', 'DAY', 'Wednesday') sub('(.*)(day)', '\\1\\2', 'Wednesday') sub('(.*)(day)', '\\1', 'Wednesday') sub('(.*)(day)', '\\1DAY', 'Wednesday')
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.