stringUtils: Utils to deal with strings and characters

Description Usage Arguments Details Examples

Description

Useful function to manipulate strings and characters in ComMA.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
trimStartEndSpace(x, y = "")

trimSpace(x, y = "")

substrLast(x, n)

substrLastSplit(x, split = "/", ...)

substrBetween(x, l, r)

simpleCap(x)

getFirstNInString(vect, n, collapse = ", ", tail = "...")

countCharOccurrences(char, s)

freqUniqueValue(x, rm.zero = TRUE)

findDuplicates(x)

gsubDF(pattern, replacement, df, stringsAsFactors = FALSE,
  check.names = FALSE, ...)

pasteDF(df, ..., sep = " ", collapse = NULL, stringsAsFactors = FALSE,
  check.names = FALSE)

Arguments

x

The string.

y

The string of a replacement, default is the empty string.

n

The last n characters

split

character vector (or object which can be coerced to such) containing regular expression(s) (unless fixed = TRUE) to use for splitting. Detail to strsplit.

...

More parameters passed to strsplit from substrLastSplit.

l

The regex on the left.

r

The regex on the right.

vect

A Vector or List.

pattern, replacement, ...

Arguemnts for gsub.

df

Data frame as the input.

sep, collapse

Arguemnts for paste.

n

The first n elements.

Details

trimStartEndSpace trims the leading and trailing whitespace in a string. Refer to http://stackoverflow.com/questions/2261079/how-to-trim-leading-and-trailing-whitespace-in-r.

trimSpace removes all whitespace from a string. Refer to http://stackoverflow.com/questions/5992082/how-to-remove-all-whitespace-from-a-string.

substrLast extracts the last n characters from a string x. Refer to http://stackoverflow.com/questions/7963898/extracting-the-last-n-characters-from-a-string-in-r.

substrLastSplit extracts the last characters from a string x splitted by split. It can be used to get the file name from a file path, as default split='/'.

substrBetween extracts the substring between two given regular expressions. Refer to http://stackoverflow.com/questions/14146362/regex-extract-string-between.

simpleCap capitalizes the first letter of a word string. Refer to http://stackoverflow.com/questions/6364783/capitalize-the-first-letter-of-both-words-in-a-two-word-string.

getFirstNInString returns the first n elements in a string, using paste. If n is greater than length, then return the whole vector or list in a string.

countCharOccurrences returns the occurrences of given charater char in string s. Refere to https://techoverflow.net/blog/2012/11/10/r-count-occurrences-of-character-in-string/.

freqUniqueValue returns 2-column data frame of the unique values and their frequency given a vector x, if x is a data frame, then return unique pairs counts including zero counts. Refere to http://stackoverflow.com/questions/16905425/find-duplicate-values-in-r.

findDuplicates uses freqUniqueValue to find the duplicates in a vector x.

gsubDF applies gsub to the entire data frame.

pasteDF paste strings to the entire data frame. This will change the type to character.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
text = "   foo bar  baz 3 "
trimStartEndSpace(text)
[1] "foo bar  baz 3"

text = "   foo bar  baz 3 "
trimSpace(text)
[1] "foobarbaz3"

x <- "some text in a string"
substrLast(x, 6)
[1] "string"
substrLast(x, 8)
[1] "a string"

filenames <- list.files(".", pattern="*.java", full.names=TRUE, recursive = TRUE)
substrLastSplit(filenames)
[1] "BEASTInterfaceTest.java"       "BooleanParameterListTest.java" "IntegerParameterListTest.java"

x <- "command took 0:1:34.67 (94.67s total)"
substrBetween(x, "\\(", "s total\\)")
[1] "string"

simpleCap("BACTERIA")
# for multi-words
taxaGroups <- c("BACTERIA", "FUNGI", "PROTISTS", "ANIMALIA")
sapply(taxaGroups, simpleCap)
#BACTERIA      FUNGI   PROTISTS   ANIMALIA 
#"Bacteria"    "Fungi" "Protists" "Animalia"

getFirstNInString(1:10, 3)
getFirstNInString(1:10, 20)

countCharOccurrences("a", "application")

freqUniqueValue(c("a", "b", "a"))
#    Var1 Freq
# 1    a    2
# 2    b    1
freqUniqueValue(data.frame(x=c(1, 1, 2), y=c(3, 4, 3)), rm.zero=F)
#   x y Freq
# 1 1 3    1
# 2 2 3    1
# 3 1 4    1
# 4 2 4    0

findDuplicates(c("a", "b", "a"))

gsubDF(".00", "", df)

pasteDF(df, "%")

walterxie/ComMA documentation built on May 3, 2019, 11:51 p.m.