lingmatch | R Documentation |
Offers a variety of methods to assess linguistic matching or accommodation, where matching is general similarity (sometimes called homophily), and accommodation is some form of conditional similarity (accounting for some base-rate or precedent; sometimes called alignment).
lingmatch(input = NULL, comp = mean, data = NULL, group = NULL, ...,
comp.data = NULL, comp.group = NULL, order = NULL, drop = FALSE,
all.levels = FALSE, type = "lsm")
input |
Texts to be compared; a vector, document-term matrix (dtm; with terms as column names), or path to a file (.txt or .csv, with texts separated by one or more lines/rows). | |||||
comp |
Defines the comparison to be made:
| |||||
data |
A matrix-like object as a reference for column names, if variables are referred to in
other arguments (e.g., | |||||
group |
A logical or factor-like vector the same length as | |||||
... |
Passes arguments to | |||||
comp.data |
A matrix-like object as a source for | |||||
comp.group |
The column name of the grouping variable(s) in | |||||
order |
A numeric vector the same length as | |||||
drop |
logical; if | |||||
all.levels |
logical; if | |||||
type |
A character at least partially matching 'lsm' or 'lsa'; applies default settings aligning with the standard calculations of each type:
|
There are a great many points of decision in the assessment of linguistic similarity and/or
accommodation, partly inherited from the great many point of decision inherent in the numerical
representation of language. Two general types of matching are implemented here as sets of
defaults: Language/Linguistic Style Matching (LSM; Niederhoffer & Pennebaker, 2002; Ireland &
Pennebaker, 2010), and Latent Semantic Analysis/Similarity (LSA; Landauer & Dumais, 1997;
Babcock, Ta, & Ickes, 2014). See the type
argument for specifics.
A list with processed components of the input, information about the comparison, and results of the comparison:
dtm
: A sparse matrix; the raw count-dtm, or a version of the original input
if it is more processed.
processed
: A matrix-like object; a processed version of the input
(e.g., weighted and categorized).
comp.type
: A string describing the comparison if applicable.
comp
: A vector or matrix-like object; the comparison data if applicable.
group
: A string describing the group if applicable.
sim
: Result of lma_simets
.
Defining groups and comparisons can sometimes be a bit complicated, and requires dataset
specific knowledge, so it can't always (readily) be done automatically. Variables entered in the
group
argument are treated differently depending on their position and other arguments:
By default, groups are treated as if they define separate chunks of data in
which comparisons should be calculated. Functions used to calculated comparisons, and
pairwise comparisons are performed separately in each of these groups. For example, if you
wanted to compare each text with the mean of all texts in its condition, a group
variable could identify and split by condition. Given multiple grouping variables,
calculations will either be done in each split (if all.levels = TRUE
; applied in
sequence so that groups become smaller and smaller), or once after all splits are made (if
all.levels = FALSE
). This makes for 'one to many' comparisons with either calculated
or preexisting standards (i.e., the profile of the current data, or a precalculated profile,
respectively).
When comparison data is identified in comp
, groups are assumed
to apply to both input
and comp
(either both in data
, or separately
between data
and comp.data
, in which case comp.group
may be needed if
the same grouping variable have different names between data
and comp.data
).
In this case, multiple grouping variables are combined into a single factor assumed to
uniquely identify a comparison. This makes for 'one to many' comparisons with specific texts
(as in the case of manipulated prompts or text-based conditions).
If comp
matches 'sequential'
, the last grouping variable
entered is assumed to identify something like speakers (i.e., a factor with two or more
levels and multiple observations per level). In this case, the data are assumed to be ordered
(or ordered once sorted by order
if specified). Any additional grouping variables
before the last are treated as splitting groups. This can set up for probabilistic
accommodation metrics. At the moment, when sequential comparisons are made within groups,
similarity scores between speakers are averaged, resulting in mean matching between speakers
within the group.
Babcock, M. J., Ta, V. P., & Ickes, W. (2014). Latent semantic similarity and language style matching in initial dyadic interactions. Journal of Language and Social Psychology, 33, 78-88.
Ireland, M. E., & Pennebaker, J. W. (2010). Language style matching in writing: synchrony in essays, correspondence, and poetry. Journal of Personality and Social Psychology, 99, 549.
Landauer, T. K., & Dumais, S. T. (1997). A solution to Plato's problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychological Review, 104, 211.
Niederhoffer, K. G., & Pennebaker, J. W. (2002). Linguistic style matching in social interaction. Journal of Language and Social Psychology, 21, 337-360.
For a general text processing function, see lma_process()
.
# compare single strings
lingmatch("Compare this sentence.", "With this other sentence.")
# compare each entry in a character vector with...
texts <- c(
"One bit of text as an entry...",
"Maybe multiple sentences in an entry. Maybe essays or posts or a book.",
"Could be lines or a column from a read-in file..."
)
## one another
lingmatch(texts)
## the first
lingmatch(texts, 1)
## the next
lingmatch(texts, "seq")
## the set average
lingmatch(texts, mean)
## other entries in a group
lingmatch(texts, group = c("a", "a", "b"))
## one another, without stop words
lingmatch(texts, exclude = "function")
## a standard average (based on function words)
lingmatch(texts, "auto", dict = lma_dict(1:9))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.