termco: Search For and Count Terms

Description Usage Arguments Value Warning Note See Also Examples

Description

termco - Search a transcript by any number of grouping variables for categories (themes) of grouped root terms. While there are other termco functions in the termco family (e.g., termco_d) termco is a more powerful and flexible wrapper intended for general use.

termco_d - Search a transcript by any number of grouping variables for root terms.

term_match - Search a transcript for words that exactly match term(s).

termco2mat - Convert a termco dataframe to a matrix for use with visualization functions (e.g., heatmap.2).

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
termco(
  text.var,
  grouping.var = NULL,
  match.list,
  short.term = TRUE,
  ignore.case = TRUE,
  elim.old = TRUE,
  percent = TRUE,
  digits = 2,
  apostrophe.remove = FALSE,
  char.keep = NULL,
  digit.remove = NULL,
  zero.replace = 0,
  ...
)

termco_d(
  text.var,
  grouping.var = NULL,
  match.string,
  short.term = FALSE,
  ignore.case = TRUE,
  zero.replace = 0,
  percent = TRUE,
  digits = 2,
  apostrophe.remove = FALSE,
  char.keep = NULL,
  digit.remove = TRUE,
  ...
)

term_match(text.var, terms, return.list = TRUE, apostrophe.remove = FALSE)

termco2mat(
  dataframe,
  drop.wc = TRUE,
  short.term = TRUE,
  rm.zerocol = FALSE,
  no.quote = TRUE,
  transform = TRUE,
  trim.terms = TRUE
)

Arguments

text.var

The text variable.

grouping.var

The grouping variables. Default NULL generates one word list for all text. Also takes a single grouping variable or a list of 1 or more grouping variables.

match.list

A list of named character vectors.

short.term

logical. If TRUE column names are trimmed versions of the match list, otherwise the terms are wrapped with 'term(phrase)'

ignore.case

logical. If TRUE case is ignored.

elim.old

logical. If TRUE eliminates the columns that are combined together by the named match.list.

percent

logical. If TRUE output given as percent. If FALSE the output is proportion.

digits

Integer; number of decimal places to round when printing.

apostrophe.remove

logical. If TRUE removes apostrophes from the text before examining.

char.keep

A character vector of symbol character (i.e., punctuation) that strip should keep. The default is to strip everything except apostrophes. termco attempts to auto detect characters to keep based on the elements in match.list.

digit.remove

logical. If TRUE strips digits from the text before counting. termco attempts to auto detect if digits should be retained based on the elements in match.list.

zero.replace

Value to replace 0 values with.

match.string

A vector of terms to search for. When using inside of term_match the term(s) must be words or partial words but do not have to be when using termco_d (i.e., they can be phrases, symbols etc.).

terms

The terms to search for in the text.var. Similar to match.list but these terms must be words or partial words rather than multiple words and symbols.

return.list

logical. If TRUE returns the output for multiple terms as a list by term rather than a vector.

dataframe

A termco (or termco_d) dataframe or object.

drop.wc

logical. If TRUE the word count column will be dropped.

rm.zerocol

logical. If TRUE any column containing all zeros will be removed from the matrix.

no.quote

logical. If TRUE the matrix will be printed without quotes if it's character.

transform

logical. If TRUE the matrix will be transformed.

trim.terms

logical. If TRUE trims the column header/names to ensure there is not a problem with spacing when using in other R functions.

...

Other argument supplied to strip.

Value

termco & termco_d - both return a list, of class "termco", of data frames and information regarding word counts:

raw

raw word counts by grouping variable

prop

proportional word counts by grouping variable; proportional to each individual's word use

rnp

a character combination data frame of raw and proportional

zero_replace

value to replace zeros with; mostly internal use

percent

The value of percent used for plotting purposes.

digits

integer value of number of digits to display; mostly internal use

term_match - returns a list or vector of possible words that match term(s).

termco2mat - returns a matrix of term counts.

Warning

Percentages are calculated as a ratio of counts of match.list elements to word counts. Word counts do not contain symbols or digits. Using symbols, digits or small segments of full words (e.g., "to") could total more than 100%.

Note

The match.list/match.string is (optionally) case and character sensitive. Spacing is an important way to grab specific words and requires careful thought. Using "read" will find the words "bread", "read" "reading", and "ready". If you want to search for just the word "read" you'd supply a vector of c(" read ", " reads", " reading", " reader"). To search for non character arguments (i.e., numbers and symbols) additional arguments from strip must be passed.

See Also

termco_c, colcomb2class

Examples

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
## Not run: 
#termco examples:

term <- c("the ", "she", " wh")
(out <- with(raj.act.1,  termco(dialogue, person, term)))

plot(out)
scores(out)
plot(scores(out))
counts(out)
plot(counts(out))
proportions(out)
plot(proportions(out))

# General form for match.list as themes
#
# ml <- list(
#     cat1 = c(),
#     cat2 = c(),
#     catn = c()
# )

ml <- list(
    cat1 = c(" the ", " a ", " an "),
    cat2 = c(" I'" ),
    "good",
    the = c("the", " the ", " the", "the")
)

(dat <- with(raj.act.1,  termco(dialogue, person, ml)))
scores(dat)  #useful for presenting in tables
counts(dat)  #prop and raw counts are useful for performing calculations
proportions(dat)
datb <- with(raj.act.1, termco(dialogue, person, ml,
    short.term = FALSE, elim.old=FALSE))
ltruncdf(datb, 20, 6)
    
(dat2 <- data.frame(dialogue=c("@bryan is bryan good @br",
    "indeed", "@ brian"), person=qcv(A, B, A)))

ml2 <- list(wrds=c("bryan", "indeed"), "@", bryan=c("bryan", "@ br", "@br"))

with(dat2, termco(dialogue, person, match.list=ml2))

with(dat2, termco(dialogue, person, match.list=ml2, percent = FALSE))

DATA$state[1] <- "12 4 rgfr  r0ffrg0"
termco(DATA$state, DATA$person, '0', digit.remove=FALSE)
DATA <- qdap::DATA

#Using with term_match and exclude    
exclude(term_match(DATA$state, qcv(th), FALSE), "truth")
termco(DATA$state, DATA$person, exclude(term_match(DATA$state, qcv(th), 
    FALSE), "truth"))
MTCH.LST <- exclude(term_match(DATA$state, qcv(th, i)), qcv(truth, stinks))
termco(DATA$state, DATA$person, MTCH.LST)

syns <- synonyms("doubt")
syns[1]
termco(DATA$state, DATA$person, unlist(syns[1]))
synonyms("doubt", FALSE)
termco(DATA$state, DATA$person, list(doubt = synonyms("doubt", FALSE)))
termco(DATA$state, DATA$person, syns)

#termco_d examples:
termco_d(DATA$state, DATA$person, c(" the", " i'"))
termco_d(DATA$state, DATA$person, c(" the", " i'"), ignore.case=FALSE)
termco_d(DATA$state, DATA$person, c(" the ", " i'"))

# termco2mat example:
MTCH.LST <- exclude(term_match(DATA$state, qcv(a, i)), qcv(is, it, am, shall))
termco_obj <- termco(DATA$state, DATA$person, MTCH.LST)
termco2mat(termco_obj)
plot(termco_obj)
plot(termco_obj, label = TRUE)
plot(termco_obj, label = TRUE, text.color = "red")
plot(termco_obj, label = TRUE, text.color="red", lab.digits=3)

## REVERSE TERMCO (return raw words found per variable)
df <- data.frame(x=1:6,
    y = c("the fluffy little bat" , "the man was round like a ball",
        "the fluffy little bat" , "the man was round like a ball",
        "he ate the chair" , "cough, cough"),
    stringsAsFactors=FALSE)

l <- list("bat" ,"man", "ball", "heavy")
z <- counts(termco(df$y, qdapTools::id(df), l))[, -2]

counts2list(z[, -1], z[, 1])

## politness
politness <- c("please", "excuse me", "thank you", "you welcome", 
    "you're welcome", "i'm sorry", "forgive me", "pardon me")

with(pres_debates2012, termco(dialogue, person, politness))
with(hamlet, termco(dialogue, person, politness))

## Term Use Percentage per N Words
dat <- with(raj, chunker(dialogue, person, n.words = 100, rm.unequal = TRUE))
dat2 <- list2df(dat, "Dialogue", "Person")
dat2[["Duration"]] <- unlist(lapply(dat, id, pad=FALSE))
dat2 <- qdap_df(dat2, "Dialogue")

Top5 <- sapply(split(raj$dialogue, raj$person), wc, FALSE) %>%
    sort(decreasing=TRUE) %>% 
    list2df("wordcount", "person") %>%
    `[`(1:5, 2)

propdat <- dat2 %&% 
    termco(list(Person, Duration), as.list(Top25Words[1:5]), percent = FALSE) %>% 
    proportions %>%
    colsplit2df %>% 
    reshape2::melt(id=c("Person", "Duration", "word.count"), variable="Word") %>%
    dplyr::filter(Person %in% Top5)

head(propdat)

ggplot(propdat, aes(y=value, x=Duration, group=Person, color=Person)) +
    geom_line(size=1.25) +
    facet_grid(Word~., scales="free_y") +
    ylab("Percent of Word Use")  +
    xlab("Per 100 Words") + 
    scale_y_continuous(labels = percent)

ggplot(propdat, aes(y=value, x=Duration, group=Word, color=Word)) +
    geom_line(size=1.25) +
    facet_grid(Person~.) +
    ylab("Percent of Word Use")  +
    xlab("Per 100 Words") + 
    scale_y_continuous(labels = percent)

ggplot(propdat, aes(y=value, x=Duration, group=Word)) +
    geom_line() +
    facet_grid(Word~Person, scales="free_y") +
    ylab("Percent of Word Use")  +
    xlab("Per 100 Words") + 
    scale_y_continuous(labels = percent) +
    ggthemes::theme_few()
    
## Discourse Markers: See...
## Schffrin, D. (2001). Discourse markers: Language, meaning, and context. 
##    In D. Schiffrin, D. Tannen, & H. E. Hamilton (Eds.), The handbook of 
##    discourse analysis (pp. 54-75). Malden, MA: Blackwell Publishing.

discoure_markers <- list(
    response_cries = c(" oh ", " ah ", " aha ", " ouch ", " yuk "),
    back_channels = c(" uh-huh ", " uhuh ", " yeah "), 
    summons = " hey ", 
    justification = " because "
)

(markers <- with(pres_debates2012, 
    termco(dialogue, list(person, time), discoure_markers)
))
plot(markers, high="red")

with(pres_debates2012, 
    termco(dialogue, list(person, time), discoure_markers, elim.old = FALSE)
)

with(pres_debates2012, 
    dispersion_plot(dialogue, unlist(discoure_markers), person, time)
)

## End(Not run)

trinker/qdap documentation built on Sept. 30, 2020, 6:28 p.m.