liwcalike: Analyze text in a LIWC-alike fashion

Description Usage Arguments Value Segmentation Examples

Description

Analyze a set of texts to produce a dataset of percentages and other quantities describing the text, similar to the functionality supplied by the Linguistic Inquiry and Word Count standalone software distributed at http://liwc.wpengine.com.

Usage

1
2
3
4
5
6
7
8
liwcalike(x, ...)

## S3 method for class 'corpus'
liwcalike(x, ...)

## S3 method for class 'character'
liwcalike(x, dictionary = NULL, tolower = TRUE,
  verbose = TRUE, digits = 2, ...)

Arguments

x

input object, a quanteda corpus or character vector for analysis

...

options passed to tokens offering finer-grained control over how "words" are defined

dictionary

a quanteda dictionary object supplied for analysis

tolower

convert to common (lower) case before tokenizing

verbose

if TRUE print status messages during processing

digits

how many significant digits to print for percentage quantities

Value

a data.frame object containing the analytic results, one row per document supplied

Segmentation

The LIWC standalone software has many options for segmenting the text. While this function does not supply segmentation options, you can easily achieve the same effect by converting the input object into a corpus (if it is not already a corpus) and using corpus_reshape or corpus_segment to split the input texts into smaller units based on user-supplied tags, sentence, or paragraph boundaries.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
liwcalike(data_char_testphrases)

# examples for comparison
txt <- c("The red-shirted lawyer gave her yellow-haired, red nose ex-boyfriend $300
            out of pity:(.")
dict <- quanteda::dictionary(list(people = c("lawyer", "boyfriend"),
                                  color_fixed = "red",
                                  color_glob = c("red*", "yellow*", "green*"),
                                  mwe = "out of"))
liwcalike(txt, dict, what = "word")
liwcalike(txt, dict, what = "fasterword")
(toks <- quanteda::tokens(txt, what = "fasterword", remove_hyphens = TRUE))
length(toks[[1]])
# LIWC says 12 words

## Not run: # works with LIWC 2015 dictionary too
dict_liwc_2015 <- dictionary(file = "~/Dropbox/QUANTESS/dictionaries/LIWC/LIWC2015_English_Flat.dic",
                             format = "LIWC")
dat_liwc_analysis <- liwcalike(data_corpus_inaugural, dict_liwc_2015)
dat_liwc_analysis[1:6, 1:10]
##           docname Segment   WC      WPS Sixltr   Dic function article relativ motion
## 1 1789-Washington       1 1540 62.21739  24.35 253.1   52.403  9.0909 101.361 0.3483
## 2 1793-Washington       2  147 33.75000  25.17 250.3    5.065  0.9091  10.884 0.0387
## 3      1797-Adams       3 2584 62.72973  24.61 237.5   82.403 15.0649 163.946 0.3096
## 4  1801-Jefferson       4 1935 42.19512  20.36 253.2   62.143 10.0000 105.442 0.7353
## 5  1805-Jefferson       5 2381 48.13333  22.97 255.8   79.221 10.9091 151.701 0.6966
## 6    1809-Madison       6 1267 56.04762  24.78 258.2   42.987  8.3117  83.673 0.3870

## End(Not run)

kbenoit/quanteda.dictionaries documentation built on May 30, 2019, 11:40 p.m.