discourse_connector: Extract Discourse Connectors in Context
In trinker/discon: Discourse Connectors Analysis

Description Usage Arguments Value Arguments2 Functions References See Also Examples

Extract discourse connectors in context. This is the flexible default template for modular use in specific discourse connector functions.

discourse_connector(text.var, grouping.var, n.before = 1, tot = FALSE,
  n.after = n.before, ord.inds = TRUE, markup = c("<<", ">>"),
  name = NULL, ...)

discourse_connector_logical(text.var, grouping.var, n.before = 1,
  tot = FALSE, n.after = n.before, ord.inds = TRUE, markup = c("<<",
  ">>"), name = NULL, ...)

`text.var`	The text variable.
`grouping.var`	The grouping variables. Also takes a single grouping variable or a list of 1 or more grouping variables.
`n.before`	The number of rows before the indexed occurrence.
`tot`	logical. If `TRUE` condenses sub-units (e.g., sentences) into turns of talk for that `grouping.var`.
`n.after`	The number of rows after the indexed occurrence.
`ord.inds`	logical. If `TRUE` inds is ordered least to greatest.
`markup`	A character vector of length two indicating the left (element 1) and right (element 2) boundary markers to use to highlight the discourse connectors. Use `c("", "")` to not mark the discourse connectors.
`name`	A string indicating the name to search for within the internal data sets, typically the function's name. Generally, for internal use.
`...`	Other arguments passed to `termco`.

Returns returns a list of 2-3:

`counts`	A `termco` object of discourse connector counts.
`Context 1`	A `trans_context` object of the discourse connectors in context. Note the name of this object is supplied by `names` element one.
`Context 2...n`	An optional (not returned if `regex` is of length one) `trans_context` object of the discourse connectors in context. Note the name of this (these) object(s) is supplied by `names` element 2...n.

discourse_connector & discourse_connector_logical require 3 arguments (passed to ellipsis or internally through the name argument) that are responsible for checking for terms and naming them in output. Typically regex and terms are searching for the same thing but expressed as a regular expression of a simplified termco approach to terms searching. Generally, these arguments are used internally but are documented here:

regex - A list of strings of or single string regular expression(s) used to search for expressions in the transcript excerpts and mark them up.
terms - A list of terms to search for in termco and dispersion plot.
names - A vector of names that corresponds to the number of regular expressions searched for.

discourse_connector_logical can take 4 different functions (as arguments passed to ellipsis) that perform logical checks or alter text variables before transcript sectioning & graphics are generated from the text. Typically, these functions are used internally but are documented here:

fun1 - A function that checks the text variable and returns a logical vector. This allows for additional restrictions to be placed upon the text beyond the limited (non-regex) capabilities of termco and trans_context.
fun2 - A function that checks the grouping variable and returns a logical vector. This allows for additional restrictions to be placed upon the grouping variables that can't be addressed by termco and trans_context.
fun3 - A function that alters the text variable for the creation of transcripts sections (see trans_context) & graphic visualizations of the data (including the generic pot method).
fun4 - A function that alters the text variable for the creation of graphic visualizations of the data only (including the generic pot method).

Kalajahi, S. A. R., Abdullah, A. N., Mukundan, J., & Tannacito, D. J. (2012) Discourse connectors: An overview of the history, definition and classification of the term. World Applied Sciences Journal, 19(11), 1659-1673.

termco, trans_context

## Marker with one type (just: "I")
out1 <- with(pres_debates2012[1:200, ], discourse_connector(dialogue, person,
    names = c("I"),
    regex = "\\bI('[a-z]+)*\\b",
    terms = list(I = c(" I ", " I'"))
))

out1[[1]]
out1[[2]]
plot(out1)

## Marker with two types (both: "I" & "you")
out2 <- with(pres_debates2012[1:200, ], discourse_connector(dialogue, person,
    names = c("I", "you"),
    regex =  list(
        I = "I('[a-z]+)*\\b",
        you = "(\\b[Yy]ou('[a-z]+)*\\b)"
    ),
    terms = list(
        I = c(" I ", " I'"),
        you = c(" you ", " you'")
    )
))
out2[[1]]
out2[[2]]
out2[[3]]

## Save externally use .doc or .txt
## print(out2[[2]], file="you_I.doc")

## Key Words in Context
## Determine top 15 words
topterms <- qdap::freq_terms(
    qdap::pres_debates2012[["dialogue"]],
    top = 20,
    at.least = 5,
    stopwords = c(qdapDictionaries::contractions[[1]], qdapDictionaries::Top200Words)
)

## Marker with top 15 words
out3 <- with(pres_debates2012, discourse_connector(dialogue, person,
    names = c("top15"),
    regex =  list(
        top15 = qdapRegex::pastex(qdapRegex::group(qdapRegex::bind(topterms[[1]])))
    ),
    terms = list(
        top15 = qdap::spaste(topterms[[1]])
    )
))
out3[[1]]
out3[[2]]
plot(out3)