quanteda: Quantitative Analysis of Textual Data

Share:

A fast, flexible toolset for for the management, processing, and quantitative analysis of textual data in R.

Author
Kenneth Benoit [aut, cre], Paul Nulty [aut], Kohei Watanabe [ctb], Benjamin Lauderdale [ctb], Adam Obeng [ctb], Pablo Barberá [ctb], Will Lowe [ctb]
Date of publication
2016-10-31 23:44:48
Maintainer
Kenneth Benoit <kbenoit@lse.ac.uk>
License
GPL-3
Version
0.9.8.5
URLs

View on CRAN

Man pages

applyDictionary
apply a dictionary or thesaurus to an object
as.data.frame-dfm-method
coerce a dfm to a data.frame
cbind.dfm
Combine dfm objects by Rows or Columns
changeunits
change the document units of a corpus
collocations
Detect collocations from text
compress
compress a dfm by combining similarly named dimensions
convert
convert a dfm to a non-quanteda format
corpus
constructor for corpus objects
corpusSource-class
corpus source classes
dfm
create a document-feature matrix
dfm-class
Virtual class "dfm" for a document-feature matrix
dictionary
create a dictionary
docfreq
compute the (weighted) document frequency of a feature
docnames
get or set document names
docvars
get or set for document-level variables
encodedTextFiles
a .zip file of texts containing a variety of differently...
encodedTexts
encoded texts for testing
encoding
detect the encoding of texts
exampleString
A paragraph of text for testing various text-based functions
features
extract the feature labels from a dfm
findSequences
find sequences of tokens
head.dfm
Return the first or last part of a dfm
head.tokenSequences
print a tokenSequences objects
ie2010Corpus
Irish budget speeches from 2010
inaugCorpus
A corpus of US presidential inaugural addresses from...
joinTokens
join tokens function
kwic
List key words in context from a text or a corpus of texts.
LBGexample
dfm with example data from Table 1 of Laver Benoit and Garry...
lexdiv
calculate lexical diversity
metacorpus
get or set corpus metadata
metadoc
get or set document-level meta-data
mobydickText
Project Gutenberg text of Herman Melville's _Moby Dick_
ndoc
get the number of documents or features
ngrams
Create ngrams and skipgrams
nsentence
count the number of sentences
ntoken
count the number of tokens or types
phrasetotoken
convert phrases into single tokens
plot.dfm
plot features as a wordcloud
plot.kwic
plot the dispersion of key word(s)
predict.textmodel
prediction method for Naive Bayes classifier objects
print.dfm
print a dfm object
print.tokenizedTexts
print a tokenizedTexts objects
print.tokenSequences
print a tokenSequences objects
quanteda-package
An R package for the quantitative analysis of textual data
readability
calculate readability
removeFeatures
remove features from an object
sample
Randomly sample documents or features
scrabble
compute the Scrabble letter values of text
segment
segment texts into component elements
selectFeatures
select features from an object
selectFeaturesOLD
old version of selectFeatures.tokenizedTexts
settings
Get or set the corpus settings
show-dictionary-method
print a dictionary object
similarity
compute similarities between documents and/or features
sort.dfm
sort a dfm by one or more margins
stopwords
access built-in stopwords
subset.corpus
extract a subset of a corpus
summary.corpus
summarize a corpus or a vector of texts
syllables
count syllables in a text
tail.tokenSequences
print a tokenSequences objects
textfile
read a text corpus source from a file
textmodel
fit a text model
textmodel_ca
correspondence analysis of a document-feature matrix
textmodel_fitted-class
the fitted textmodel classes
textmodel_NB
Naive Bayes classifier for texts
textmodel_wordfish
wordfish text model
textmodel_wordscores
Wordscores text model
texts
get corpus texts
tf
compute (weighted) term frequency from a dfm
tfidf
compute tf-idf weights from a dfm
tokenize
tokenize a set of texts
toLower
Convert texts to lower (or upper) case
topfeatures
list the most frequent features
trim
Trim a dfm using threshold-based or random feature selection
ukimmigTexts
Immigration-related sections of 2010 UK party manifestos
weight
weight the feature frequencies in a dfm
wordlists
word lists used in some readability indexes
wordstem
stem words

Files in this package

quanteda
quanteda/inst
quanteda/inst/CITATION
quanteda/inst/LICENSE.txt
quanteda/inst/extdata
quanteda/inst/extdata/encodedTextFiles.zip
quanteda/inst/doc
quanteda/inst/doc/quickstart.Rmd
quanteda/inst/doc/LitVignette.R
quanteda/inst/doc/quickstart.R
quanteda/inst/doc/quickstart.html
quanteda/inst/doc/development-plans.Rmd
quanteda/inst/doc/development-plans.html
quanteda/inst/doc/development-plans.R
quanteda/inst/doc/LitVignette.html
quanteda/inst/doc/LitVignette.Rmd
quanteda/tests
quanteda/tests/performance_tests
quanteda/tests/performance_tests/tokenizeSizes.R
quanteda/tests/data
quanteda/tests/data/glob
quanteda/tests/data/glob/4.txt
quanteda/tests/data/glob/10.txt
quanteda/tests/data/glob/1.txt
quanteda/tests/data/glob/2.txt
quanteda/tests/data/glob/subdir2
quanteda/tests/data/glob/subdir2/test.txt
quanteda/tests/data/glob/10.json
quanteda/tests/data/glob/3.txt
quanteda/tests/data/glob/subdir1
quanteda/tests/data/glob/subdir1/test.txt
quanteda/tests/data/tar
quanteda/tests/data/tar/test.tar
quanteda/tests/data/empty
quanteda/tests/data/empty/empty.nonesuch
quanteda/tests/data/empty/empty.tar.gz
quanteda/tests/data/empty/empty.pdf
quanteda/tests/data/empty/empty.json
quanteda/tests/data/empty/empty.zip
quanteda/tests/data/empty/empty.docx
quanteda/tests/data/empty/empty.csv
quanteda/tests/data/empty/empty.txt
quanteda/tests/data/empty/empty.doc
quanteda/tests/data/empty/empty.xml
quanteda/tests/data/targz
quanteda/tests/data/targz/test.tar.gz
quanteda/tests/data/tweets
quanteda/tests/data/tweets/stream.json
quanteda/tests/data/json
quanteda/tests/data/json/test3.json
quanteda/tests/data/json/test.json
quanteda/tests/data/json/test2.json
quanteda/tests/data/tab
quanteda/tests/data/tab/test2.tab
quanteda/tests/data/tab/test.tab
quanteda/tests/data/tsv
quanteda/tests/data/tsv/test2.tsv
quanteda/tests/data/tsv/test.tsv
quanteda/tests/data/zip
quanteda/tests/data/zip/test3.txt
quanteda/tests/data/zip/test.zip
quanteda/tests/data/zip/test4.txt
quanteda/tests/data/zip/test.txt
quanteda/tests/data/zip/inauguralTopLevel.zip
quanteda/tests/data/zip/test2.txt
quanteda/tests/data/tarbz
quanteda/tests/data/tarbz/test.tar.bz
quanteda/tests/data/fruits
quanteda/tests/data/fruits/banana.txt
quanteda/tests/data/fruits/orange.txt
quanteda/tests/data/fruits/1.csv
quanteda/tests/data/fruits/2.csv
quanteda/tests/data/fruits/apple.txt
quanteda/tests/data/fox
quanteda/tests/data/fox/fox.json
quanteda/tests/data/fox/fox.txt
quanteda/tests/data/csv
quanteda/tests/data/csv/test.csv
quanteda/tests/data/csv/test2.csv
quanteda/tests/data/xml
quanteda/tests/data/xml/test.xml
quanteda/tests/data/dictionaries
quanteda/tests/data/dictionaries/mary.ykd
quanteda/tests/data/dictionaries/mary.cat
quanteda/tests/data/dictionaries/mary.lc3
quanteda/tests/data/dictionaries/actually_ykd.cat
quanteda/tests/data/dictionaries/mary.dic
quanteda/tests/data/dictionaries/mary.lcd
quanteda/tests/data/docvars
quanteda/tests/data/docvars/unequal
quanteda/tests/data/docvars/unequal/1_apple_red.txt
quanteda/tests/data/docvars/unequal/2_orange.txt
quanteda/tests/data/docvars/json
quanteda/tests/data/docvars/json/1_apple.json
quanteda/tests/data/docvars/json/2_orange.json
quanteda/tests/data/docvars/two
quanteda/tests/data/docvars/two/1_apple_red.txt
quanteda/tests/data/docvars/two/2_orange_orange.txt
quanteda/tests/data/docvars/two/1_apple_red.json
quanteda/tests/data/docvars/two/2_orange_orange.json
quanteda/tests/data/docvars/one
quanteda/tests/data/docvars/one/1_apple.txt
quanteda/tests/data/docvars/one/2_orange.txt
quanteda/tests/data/docvars/dash
quanteda/tests/data/docvars/dash/2-orange.txt
quanteda/tests/data/docvars/dash/1-apple.txt
quanteda/tests/data/docvars/csv
quanteda/tests/data/docvars/csv/1_apple.csv
quanteda/tests/data/docvars/csv/2_orange.csv
quanteda/tests/testthat.R
quanteda/tests/testthat
quanteda/tests/testthat/testToLower.R
quanteda/tests/testthat/testStopwords.R
quanteda/tests/testthat/testSimilarity.R
quanteda/tests/testthat/testCollocations.R
quanteda/tests/testthat/testPlots.R
quanteda/tests/testthat/testConvert.R
quanteda/tests/testthat/testDfm.R
quanteda/tests/testthat/data
quanteda/tests/testthat/data/glob
quanteda/tests/testthat/data/glob/4.txt
quanteda/tests/testthat/data/glob/10.txt
quanteda/tests/testthat/data/glob/1.txt
quanteda/tests/testthat/data/glob/2.txt
quanteda/tests/testthat/data/glob/subdir2
quanteda/tests/testthat/data/glob/subdir2/test.txt
quanteda/tests/testthat/data/glob/10.json
quanteda/tests/testthat/data/glob/3.txt
quanteda/tests/testthat/data/glob/subdir1
quanteda/tests/testthat/data/glob/subdir1/test.txt
quanteda/tests/testthat/data/tar
quanteda/tests/testthat/data/tar/test.tar
quanteda/tests/testthat/data/fruits2.csv
quanteda/tests/testthat/data/fruits1.txt
quanteda/tests/testthat/data/json
quanteda/tests/testthat/data/json/lines.json
quanteda/tests/testthat/data/json/valid.json
quanteda/tests/testthat/data/json/tweets-api-compact.json
quanteda/tests/testthat/data/json/tweets-api-pretty.json
quanteda/tests/testthat/data/json/tweets-lines.json
quanteda/tests/testthat/data/yoshi.ykd
quanteda/tests/testthat/data/fruits1.csv
quanteda/tests/testthat/testUtil.R
quanteda/tests/testthat/testSelectFeatures.R
quanteda/tests/testthat/testNgrams.R
quanteda/tests/testthat/testKwic.R
quanteda/tests/testthat/testDictionaries.R
quanteda/tests/testthat/testStem.R
quanteda/tests/testthat/test_tokenizer.R
quanteda/tests/testthat/testTextfile.R
quanteda/tests/testthat/testTextfile2.R
quanteda/tests/testthat/testCorpus.R
quanteda/src
quanteda/src/utility.cpp
quanteda/src/Makevars
quanteda/src/tokens_select.cpp
quanteda/src/ngrams.cpp
quanteda/src/tokens_join.cpp
quanteda/src/sequences.cpp
quanteda/src/Makevars.win
quanteda/src/RcppExports.cpp
quanteda/src/wordfish.cpp
quanteda/NAMESPACE
quanteda/demo
quanteda/demo/quanteda.R
quanteda/demo/00Index
quanteda/NEWS.md
quanteda/data
quanteda/data/wordlists.RData
quanteda/data/inaugCorpus.RData
quanteda/data/ukimmigTexts.RData
quanteda/data/ie2010Corpus.RData
quanteda/data/LBGexample.RData
quanteda/data/exampleString.RData
quanteda/data/mobydickText.RData
quanteda/data/encodedTexts.RData
quanteda/data/inaugTexts.RData
quanteda/data/englishSyllables.RData
quanteda/data/datalist
quanteda/data/stopwords.RData
quanteda/R
quanteda/R/converters.R
quanteda/R/collocations.R
quanteda/R/selectFeatures-old.R
quanteda/R/dfm-classes.R
quanteda/R/stopwords.R
quanteda/R/tokenize.R
quanteda/R/encoding.R
quanteda/R/resample.R
quanteda/R/kwic.R
quanteda/R/quanteda.R
quanteda/R/textmodel-ca.R
quanteda/R/describe-texts.R
quanteda/R/phrases.R
quanteda/R/corpus.R
quanteda/R/dfm-weighting.R
quanteda/R/readability.R
quanteda/R/plots.R
quanteda/R/dataDocs.R
quanteda/R/joinTokens.R
quanteda/R/selectFeatures.R
quanteda/R/toLower.R
quanteda/R/tokenize_outtakes.R
quanteda/R/textmodel-wordfish.R
quanteda/R/settings.R
quanteda/R/RcppExports.R
quanteda/R/textmodel-wordscores.R
quanteda/R/textmodel-NB.R
quanteda/R/findSequences.R
quanteda/R/textmodel-generics.R
quanteda/R/util.R
quanteda/R/syllables.R
quanteda/R/similarity.R
quanteda/R/lexdiv.R
quanteda/R/dictionaries.R
quanteda/R/wordstem.R
quanteda/R/textfile.R
quanteda/R/dfm-main.R
quanteda/R/zzz.R
quanteda/R/ngrams.R
quanteda/R/dfm-methods.R
quanteda/vignettes
quanteda/vignettes/quickstart.Rmd
quanteda/vignettes/images
quanteda/vignettes/images/unnamed-chunk-14-1.png
quanteda/vignettes/images/unnamed-chunk-38-1.png
quanteda/vignettes/images/unnamed-chunk-28-1.png
quanteda/vignettes/images/unnamed-chunk-35-1.png
quanteda/vignettes/images/prescluster.png
quanteda/vignettes/images/unnamed-chunk-27-1.png
quanteda/vignettes/development-plans.Rmd
quanteda/vignettes/mystyle.css
quanteda/vignettes/quickstart.md
quanteda/vignettes/LitVignette.Rmd
quanteda/README.md
quanteda/MD5
quanteda/build
quanteda/build/vignette.rds
quanteda/DESCRIPTION
quanteda/man
quanteda/man/segment.Rd
quanteda/man/textmodel_wordfish.Rd
quanteda/man/show-dictionary-method.Rd
quanteda/man/wordlists.Rd
quanteda/man/textmodel_wordscores.Rd
quanteda/man/docnames.Rd
quanteda/man/changeunits.Rd
quanteda/man/print.dfm.Rd
quanteda/man/weight.Rd
quanteda/man/tfidf.Rd
quanteda/man/inaugCorpus.Rd
quanteda/man/print.tokenizedTexts.Rd
quanteda/man/selectFeaturesOLD.Rd
quanteda/man/features.Rd
quanteda/man/metacorpus.Rd
quanteda/man/sort.dfm.Rd
quanteda/man/similarity.Rd
quanteda/man/textmodel.Rd
quanteda/man/textmodel_fitted-class.Rd
quanteda/man/lexdiv.Rd
quanteda/man/LBGexample.Rd
quanteda/man/ntoken.Rd
quanteda/man/textmodel_ca.Rd
quanteda/man/convert.Rd
quanteda/man/dfm.Rd
quanteda/man/subset.corpus.Rd
quanteda/man/topfeatures.Rd
quanteda/man/kwic.Rd
quanteda/man/readability.Rd
quanteda/man/encoding.Rd
quanteda/man/as.data.frame-dfm-method.Rd
quanteda/man/textfile.Rd
quanteda/man/texts.Rd
quanteda/man/wordstem.Rd
quanteda/man/dictionary.Rd
quanteda/man/ie2010Corpus.Rd
quanteda/man/corpusSource-class.Rd
quanteda/man/cbind.dfm.Rd
quanteda/man/corpus.Rd
quanteda/man/dfm-class.Rd
quanteda/man/encodedTexts.Rd
quanteda/man/phrasetotoken.Rd
quanteda/man/ngrams.Rd
quanteda/man/plot.kwic.Rd
quanteda/man/ukimmigTexts.Rd
quanteda/man/nsentence.Rd
quanteda/man/plot.dfm.Rd
quanteda/man/tail.tokenSequences.Rd
quanteda/man/textmodel_NB.Rd
quanteda/man/head.dfm.Rd
quanteda/man/joinTokens.Rd
quanteda/man/summary.corpus.Rd
quanteda/man/metadoc.Rd
quanteda/man/trim.Rd
quanteda/man/mobydickText.Rd
quanteda/man/syllables.Rd
quanteda/man/settings.Rd
quanteda/man/tf.Rd
quanteda/man/tokenize.Rd
quanteda/man/docvars.Rd
quanteda/man/ndoc.Rd
quanteda/man/removeFeatures.Rd
quanteda/man/collocations.Rd
quanteda/man/docfreq.Rd
quanteda/man/predict.textmodel.Rd
quanteda/man/exampleString.Rd
quanteda/man/applyDictionary.Rd
quanteda/man/scrabble.Rd
quanteda/man/toLower.Rd
quanteda/man/sample.Rd
quanteda/man/stopwords.Rd
quanteda/man/encodedTextFiles.Rd
quanteda/man/quanteda-package.Rd
quanteda/man/head.tokenSequences.Rd
quanteda/man/print.tokenSequences.Rd
quanteda/man/compress.Rd
quanteda/man/findSequences.Rd
quanteda/man/selectFeatures.Rd