word_associate: Find Associated Words
In qdap: Bridging the Gap Between Qualitative Data and Quantitative Analysis

word_associate

R Documentation

Find Associated Words

Description

Find words associated with a given word(s) or a phrase(s). Results can be output as a network graph and/or wordcloud.

Usage

word_associate(
  text.var,
  grouping.var = NULL,
  match.string,
  text.unit = "sentence",
  extra.terms = NULL,
  target.exclude = NULL,
  stopwords = NULL,
  network.plot = FALSE,
  wordcloud = FALSE,
  cloud.colors = c("black", "gray55"),
  title.color = "blue",
  nw.label.cex = 0.8,
  title.padj = -4.5,
  nw.label.colors = NULL,
  nw.layout = NULL,
  nw.edge.color = "gray90",
  nw.label.proportional = TRUE,
  nw.title.padj = NULL,
  nw.title.location = NULL,
  title.font = NULL,
  title.cex = NULL,
  nw.edge.curved = TRUE,
  cloud.legend = NULL,
  cloud.legend.cex = 0.8,
  cloud.legend.location = c(-0.03, 1.03),
  nw.legend = NULL,
  nw.legend.cex = 0.8,
  nw.legend.location = c(-1.54, 1.41),
  legend.override = FALSE,
  char2space = "~~",
  ...
)

Arguments

`text.var`	The text variable.
`grouping.var`	The grouping variables. Default `NULL` generates one word list for all text. Also takes a single grouping variable or a list of 1 or more grouping variables.
`match.string`	A list of vectors or vector of terms to associate in the text.
`text.unit`	The text unit (either `"sentence"` or `"tot"`. This argument determines what unit to find the match string words within. For example if `"sentence"` is chosen the function pulls all text for sentences the match string terms are found in.
`extra.terms`	Other terms to color beyond the match string.
`target.exclude`	A vector of words to exclude from the `match.string`.
`stopwords`	Words to exclude from the analysis.
`network.plot`	logical. If `TRUE` plots a network plot of the words.
`wordcloud`	logical. If `TRUE` plots a wordcloud plot of the words.
`cloud.colors`	A vector of colors equal to the length of `match.string` +1.
`title.color`	A character vector of length one corresponding to the color of the title.
`nw.label.cex`	The magnification to be used for network plot labels relative to the current setting of cex. Default is .8.
`title.padj`	Adjustment for the title. For strings parallel to the axes, padj = 0 means right or top alignment, and padj = 1 means left or bottom alignment.
`nw.label.colors`	A vector of colors equal to the length of `match.string` +1.
`nw.layout`	layout types supported by igraph. See `layout`.
`nw.edge.color`	A character vector of length one corresponding to the color of the plot edges.
`nw.label.proportional`	logical. If `TRUE` scales the network plots across grouping.var to allow plot to plot comparisons.
`nw.title.padj`	Adjustment for the network plot title. For strings parallel to the axes, padj = 0 means right or top alignment, and padj = 1 means left or bottom alignment.
`nw.title.location`	On which side of the network plot (1=bottom, 2=left, 3=top, 4=right).
`title.font`	The font family of the cloud title.
`title.cex`	Character expansion factor for the title. `NULL` and `NA` are equivalent to 1.0.
`nw.edge.curved`	logical. If `TRUE` edges will be curved rather than straight paths.
`cloud.legend`	A character vector of names corresponding to the number of vectors in `match.string`. Both `nw.legend` and `cloud.legend` can be set separately; or one may be set and by default the other will assume those legend labels. If the user does not desire this behavior use the `legend.override` argument.
`cloud.legend.cex`	Character expansion factor for the wordcloud legend. `NULL` and `NA` are equivalent to 1.0.
`cloud.legend.location`	The x and y co-ordinates to be used to position the wordcloud legend. The location may also be specified by setting x to a single keyword from the list `"bottomright"`, `"bottom"`, `"bottomleft"`, `"left"`, `"topleft"`, `"top"`, `"topright"`, `"right"` and `"center"`. This places the legend on the inside of the plot frame at the given location.
`nw.legend`	A character vector of names corresponding to the number of vectors in `match.string`. Both `nw.legend` and `cloud.legend` can be set separately; or one may be set and by default the other will assume those legend labels. If the user does not desire this behavior use the `legend.override` argument.
`nw.legend.cex`	Character expansion factor for the network plot legend. `NULL` and `NA` are equivalent to 1.0.
`nw.legend.location`	The x and y co-ordinates to be used to position the network plot legend. The location may also be specified by setting x to a single keyword from the list `"bottomright"`, `"bottom"`, `"bottomleft"`, `"left"`, `"topleft"`, `"top"`, `"topright"`, `"right"` and `"center"`. This places the legend on the inside of the plot frame at the given location.
`legend.override`	By default if legend labels are supplied to either `cloud.legend` or `nw.legend` may be set and if the other remains `NULL` it will assume the supplied vector to the previous legend argument. If this behavior is not desired `legend.override` should be set to `TRUE`.
`char2space`	Currently a road to nowhere. Eventually this will allow the retention of characters as is allowed in `trans_cloud` already.
`...`	Other arguments supplied to `trans_cloud`.

Value

Returns a list:

`word frequency matrices`	Word frequency matrices for each grouping variable.
`dialogue`	A list of dataframes for each word list (each vector supplied to `match.string`) and a final dataframe of all combined text units that contain any match string.
`match.terms`	A list of vectors of word lists (each vector supplied to `match.string`).

Optionally, returns a word cloud and/or a network plot of the text unit containing the match.string terms.

Examples

## Not run: 
ms <- c(" I ", "you")
et <- c(" it", " tell", "tru")
out1 <- word_associate(DATA2$state, DATA2$person, match.string = ms, 
    wordcloud = TRUE,  proportional = TRUE, 
    network.plot = TRUE,  nw.label.proportional = TRUE, extra.terms = et,  
    cloud.legend =c("A", "B", "C"),
    title.color = "blue", cloud.colors = c("red", "purple", "gray70"))

#======================================
#Note: You don't have to name the vectors in the lists but I do for clarity
ms <- list(
    list1 = c(" I ", " you", "not"), 
    list2 = c(" wh")          
)

et <- list(
    B = c(" the", "do", "tru"), 
    C = c(" it", " already", "we")
)

out2 <- word_associate(DATA2$state, DATA2$person, match.string = ms, 
    wordcloud = TRUE,  proportional = TRUE, 
    network.plot = TRUE,  nw.label.proportional = TRUE, extra.terms = et,  
    cloud.legend =c("A", "B", "C", "D"),
    title.color = "blue", cloud.colors = c("red", "blue", "purple", "gray70"))

out3 <- word_associate(DATA2$state, list(DATA2$day, DATA2$person), match.string = ms)

#======================================
m <- list(
    A1 = c("you", "in"), #list 1
    A2 = c(" wh")        #list 2
)

n <- list(
    B = c(" the", " on"), 
    C = c(" it", " no")
)

out4 <- word_associate(DATA2$state, list(DATA2$day, DATA2$person), 
    match.string = m)
out5 <- word_associate(raj.act.1$dialogue, list(raj.act.1$person), 
    match.string = m)
out6 <- with(mraja1spl, word_associate(dialogue, list(fam.aff, sex), 
     match.string = m))
names(out6)
lapply(out6$dialogue, htruncdf, n = 20, w = 20)

#======================================
DATA2$state2 <- space_fill(DATA2$state, c("is fun", "too fun"))

ms <- list(
    list1 = c(" I ", " you", "is fun", "too fun"), 
    list2 = c(" wh")      
)

et <- list(
    B = c(" the", " on"), 
    C = c(" it", " no")
)

out7 <- word_associate(DATA2$state2, DATA2$person, match.string = ms, 
    wordcloud = TRUE,  proportional = TRUE, 
    network.plot = TRUE,  nw.label.proportional = TRUE, extra.terms = et,  
    cloud.legend =c("A", "B", "C", "D"),
    title.color = "blue", cloud.colors = c("red", "blue", "purple", "gray70"))
    
DATA2 <- qdap::DATA2

## End(Not run)

qdap documentation built on May 31, 2023, 5:20 p.m.