quanteda: Quantitative Analysis of Textual Data

A fast, flexible toolset for for the management, processing, and quantitative analysis of textual data in R.

AuthorKenneth Benoit [aut, cre], Paul Nulty [ctb], Kohei Watanabe [ctb], Adam Obeng [ctb], Haiyan Wang [ctb], Benjamin Lauderdale [ctb], Will Lowe [ctb]
Date of publication2017-01-10 10:17:43
MaintainerKenneth Benoit <kbenoit@lse.ac.uk>
LicenseGPL-3
Version0.9.9-3
http://github.com/kbenoit/quanteda

View on CRAN

Man pages

applyDictionary: apply a dictionary or thesaurus to an object

as.corpus: coerce a compressed corpus to a standard corpus

as.corpus.corpuszip: coerce a compressed corpus to a standard corpus

as.list.dist: coerce a dist object into a list

as.matrix.dfm: coerce a dfm to a matrix or data.frame

as.tokens: coercion and checking functions for tokens objects

cbind.dfm: Combine dfm objects by Rows or Columns

changeunits: deprecated name for corpus_reshape

char_tolower: convert the case of character objects

collocations: detect collocations from text

compress: compress a dfm by combining similarly named dimensions

convert: convert a dfm to a non-quanteda format

convert-wrappers: convenience wrappers for dfm convert

corpus: construct a corpus object

corpus-class: base method extensions for corpus objects

corpus_reshape: change the document units of a corpus

corpus_sample: randomly sample documents from a corpus

corpus_segment: segment texts into component elements

corpus_subset: extract a subset of a corpus

corpuszip: construct a compressed corpus object

data_char_mobydick: text of Herman Melville's Moby Dick

data_char_sampletext: a paragraph of text for testing various text-based functions

data_char_ukimmig2010: immigration-related sections of 2010 UK party manifestos

data_corpus_inaugural: US presidential inaugural address texts

data_corpus_irishbudget2010: Irish budget speeches from 2010

data-deprecated: datasets with deprecated or defunct names

data_dfm_LBGexample: dfm from data in Table 1 of Laver, Benoit, and Garry (2003)

data-internal: internal data sets

deprecated-textstat: deprecated textstat names

dfm: create a document-feature matrix

dfm2lsa: convert a dfm to an lsa "textmatrix"

dfm-class: Virtual class "dfm" for a document-feature matrix

dfm_compress: compress a dfm or fcm by combining identical dimension...

dfm_lookup: apply a dictionary to a dfm

dfm_sample: randomly sample documents or features from a dfm

dfm_select: select features from a dfm or fcm

dfm_sort: sort a dfm by frequency of one or more margins

dfm_tolower: convert the case of the features of a dfm and combine

dfm_trim: trim a dfm using frequency threshold-based feature selection

dfm_weight: weight the feature frequencies in a dfm

dictionary: create a dictionary

dictionary-class: print a dictionary object

docfreq: compute the (weighted) document frequency of a feature

docnames: get or set document names

docvars: get or set for document-level variables

fcm: create a feature co-occurrence matrix

fcm-class: Virtual class "fcm" for a feature co-occurrence matrix

fcm_sort: sort an fcm in alphabetical order of the features

featnames: get the feature labels from a dfm

features: deprecated function name for featnames

head.dfm: Return the first or last part of a dfm

is.collocations: check if an object is collocations type

is.dfm: coercion and checking functions for dfm objects

is.dictionary: check if an object is a dictionary

joinTokens: join tokens function

kwic: locate keywords-in-context

kwic_old: locate keywords-in-context (older)

kwic_split: split kwic results

metacorpus: get or set corpus metadata

metadoc: get or set document-level meta-data

ndoc: count the number of documents or features

ngrams: deprecated function name for forming ngrams and skipgrams

nscrabble: count the Scrabble letter values of text

nsentence: count the number of sentences

nsyllable: count syllables in a text

ntoken: count the number of tokens or types

phrasetotoken: convert phrases into single tokens

plot-deprecated: deprecated plotting functions

predict.textmodel: prediction method for Naive Bayes classifier objects

print.dfm: print a dfm object

quanteda-package: An R package for the quantitative analysis of textual data

reassign_attributes: copy the attributes from one S3 object to another

removeFeatures: remove features from an object

sample: randomly sample documents or features

scrabble: deprecated name for nscrabble

segment: segment: deprecated function

selectFeatures: select features from an object

selectFeaturesOLD: old version of selectFeatures.tokenizedTexts

sequence2list: convert sequences to a simple list

sequences: find variable-length collocations with filtering

settings: Get or set the corpus settings

similarity: compute similarities between documents and/or features

sort.dfm: sort a dfm by one or more margins

sparsity: compute the sparsity of a document-feature matrix

stopwords: access built-in stopwords

subset.corpus: deprecated name for corpus_subset

summary.corpus: summarize a corpus or a vector of texts

syllables: deprecated name for nsyllable

textfile: old function to read texts from files

textmodel: fit a text model

textmodel_ca: correspondence analysis of a document-feature matrix

textmodel_fitted-class: the fitted textmodel classes

textmodel-internal: internal functions for textmodel objects

textmodel_NB: Naive Bayes classifier for texts

textmodel_wordfish: wordfish text model

textmodel_wordscores: Wordscores text model

textmodel_wordshoal: wordshoal text model

textplot_scale1d: plot a fitted wordfish model

textplot_wordcloud: plot features as a wordcloud

textplot_xray: plot the dispersion of key word(s)

texts: get or assign corpus texts

textstat_lexdiv: calculate lexical diversity

textstat_readability: calculate readability

textstat_simil: Distance matrix between documents and/or features

tf: compute (weighted) term frequency from a dfm

tfidf: compute tf-idf weights from a dfm

tokenize: tokenize a set of texts

tokens: tokenize a set of texts

tokens_compound: convert token sequences into compound tokens

tokens_hash: Function to hash list-of-character tokens

tokens_lookup: apply a dictionary to a tokens object

tokens_ngrams: create ngrams and skipgrams from tokens

tokens_select: select or remove tokens from a tokens object

tokens_tolower: convert the case of tokens

tokens_wordstem: stem the terms in an object

toLower: Convert texts to lower (or upper) case

topfeatures: list the most frequent features

trim: deprecated name for dfm_trim

valuetype: pattern matching using valuetype

vector2list: convert a vector to a list

View: View methods for quanteda

weight: weight or smooth a dfm

wordstem: stem words

Files in this package

quanteda
quanteda/inst
quanteda/inst/CITATION
quanteda/inst/LICENSE.txt
quanteda/inst/doc
quanteda/inst/doc/quickstart.Rmd
quanteda/inst/doc/design.html
quanteda/inst/doc/LitVignette.R
quanteda/inst/doc/plotting.R
quanteda/inst/doc/quickstart.R
quanteda/inst/doc/quickstart.html
quanteda/inst/doc/design.Rmd
quanteda/inst/doc/plotting.html
quanteda/inst/doc/design.R
quanteda/inst/doc/plotting.Rmd
quanteda/inst/doc/LitVignette.html
quanteda/inst/doc/LitVignette.Rmd
quanteda/tests
quanteda/tests/data
quanteda/tests/data/dictionaries
quanteda/tests/data/dictionaries/mary.ykd
quanteda/tests/data/dictionaries/mary.cat
quanteda/tests/data/dictionaries/mary.lc3
quanteda/tests/data/dictionaries/actually_ykd.cat
quanteda/tests/data/dictionaries/mary.dic
quanteda/tests/data/dictionaries/yoshi.ykd
quanteda/tests/data/dictionaries/mary.lcd
quanteda/tests/testthat.R
quanteda/tests/testthat
quanteda/tests/testthat/test-indexing.R
quanteda/tests/testthat/test-utils.R
quanteda/tests/testthat/Rplots.pdf
quanteda/tests/testthat/test_segment.R
quanteda/tests/testthat/test-textstat_readability.R
quanteda/tests/testthat/testCollocations.R
quanteda/tests/testthat/test-regex2fixed.R
quanteda/tests/testthat/test-corpus.R
quanteda/tests/testthat/test-corpus_reshape.R
quanteda/tests/testthat/test-tolower.R
quanteda/tests/testthat/test-fcm.R
quanteda/tests/testthat/test-convert.R
quanteda/tests/testthat/test-tokens_compound.R
quanteda/tests/testthat/test-corpuszip.R
quanteda/tests/testthat/test-tokens.R
quanteda/tests/testthat/test-dfm_lookup.R
quanteda/tests/testthat/test-corpus-compress.R
quanteda/tests/testthat/test-plots.R
quanteda/tests/testthat/test-tokens_ngrams.R
quanteda/tests/testthat/test_tokenizer.R
quanteda/tests/testthat/test-stopwords.R
quanteda/tests/testthat/test-textstat_dist.R
quanteda/tests/testthat/test_similarity.R
quanteda/tests/testthat/test-dfm.R
quanteda/tests/testthat/test-kwic.R
quanteda/tests/testthat/test-tokens_select.R
quanteda/tests/testthat/test-wordstem.R
quanteda/tests/testthat/test-tokens_lookup.R
quanteda/src
quanteda/src/utility.cpp
quanteda/src/Makevars
quanteda/src/fcm.cpp
quanteda/src/tokens_select_mt.cpp
quanteda/src/dist_parallel.cpp
quanteda/src/quanteda.h
quanteda/src/sequences_mt.cpp
quanteda/src/tokens_lookup_mt.cpp
quanteda/src/tokens_detect_mt.cpp
quanteda/src/tokens_ngrams_mt.cpp
quanteda/src/dev.h
quanteda/src/Makevars.win
quanteda/src/RcppExports.cpp
quanteda/src/wordfish.cpp
quanteda/src/tokens_replace_mt.cpp
quanteda/NAMESPACE
quanteda/demo
quanteda/demo/quanteda.R
quanteda/demo/00Index
quanteda/NEWS.md
quanteda/data
quanteda/data/data_char_sampletext.RData
quanteda/data/inaugCorpus.RData
quanteda/data/ukimmigTexts.RData
quanteda/data/data_char_stopwords.RData
quanteda/data/ie2010Corpus.RData
quanteda/data/data_corpus_irishbudget2010.RData
quanteda/data/inaugTexts.RData
quanteda/data/data_char_mobydick.RData
quanteda/data/data_corpus_inaugural.RData
quanteda/data/datalist
quanteda/data/data_int_syllables.RData
quanteda/data/data_char_inaugural.RData
quanteda/data/data_char_ukimmig2010.RData
quanteda/data/data_dfm_LBGexample.RData
quanteda/data/data_char_wordlists.RData
quanteda/R
quanteda/R/convert.R quanteda/R/regex2fixed.R quanteda/R/corpus-methods-base.R quanteda/R/nfunctions.R quanteda/R/utils.R quanteda/R/dfm_select.R quanteda/R/collocations.R quanteda/R/kwic2.R quanteda/R/selectFeatures-old.R quanteda/R/dfm-classes.R quanteda/R/joinTokens-deprecated.R quanteda/R/readtext-methods.R quanteda/R/stopwords.R quanteda/R/textstat_dist.R quanteda/R/textstat-deprecated.R quanteda/R/tokenize.R quanteda/R/dfm_compress.R quanteda/R/plots-deprecated.R quanteda/R/dfm_trim.R quanteda/R/resample.R quanteda/R/character-methods.R quanteda/R/data-deprecated.R quanteda/R/textstat-readability.R quanteda/R/quanteda.R quanteda/R/textmodel-ca.R quanteda/R/View.R quanteda/R/corpus-deprecated.R quanteda/R/phrases.R quanteda/R/corpus.R quanteda/R/textmodel-internal.R quanteda/R/nsyllable.R quanteda/R/kwic_old.R quanteda/R/dfm-deprecated.R quanteda/R/tokens.R quanteda/R/tokens_lookup.R quanteda/R/corpus-methods-quanteda.R quanteda/R/dfm_lookup.R quanteda/R/data-documentation.R quanteda/R/plots.R quanteda/R/corpus_reshape.R quanteda/R/tokens_ngrams.R quanteda/R/selectFeatures.R quanteda/R/dfm_weight.R quanteda/R/toLower.R quanteda/R/tokenize_outtakes.R quanteda/R/textmodel-wordfish.R quanteda/R/textstat-lexdiv.R quanteda/R/settings.R quanteda/R/RcppExports.R quanteda/R/textmodel-wordscores.R quanteda/R/fcm-methods.R quanteda/R/nscrabble.R quanteda/R/corpus_sample.R quanteda/R/tokens_select.R quanteda/R/sequences.R quanteda/R/textmodel-NB.R quanteda/R/textmodel-generics.R quanteda/R/textmodel-wordshoal.R quanteda/R/corpus_segment.R quanteda/R/similarity.R quanteda/R/dictionaries-deprecated.R quanteda/R/dictionaries.R quanteda/R/wordstem.R quanteda/R/textstat_simil.R quanteda/R/corpuszip.R quanteda/R/dfm-subsetting.R quanteda/R/fcm.R quanteda/R/corpus_subset.R quanteda/R/zzz.R quanteda/R/tolower-misc.R quanteda/R/dfm_sample.R quanteda/R/valuetype.R quanteda/R/dfm-methods.R quanteda/R/tokens_compound.R quanteda/R/dfm.R
quanteda/vignettes
quanteda/vignettes/quickstart.Rmd
quanteda/vignettes/images
quanteda/vignettes/images/unnamed-chunk-14-1.png
quanteda/vignettes/images/unnamed-chunk-38-1.png
quanteda/vignettes/images/unnamed-chunk-28-1.png
quanteda/vignettes/images/unnamed-chunk-35-1.png
quanteda/vignettes/images/prescluster.png
quanteda/vignettes/images/unnamed-chunk-27-1.png
quanteda/vignettes/design.Rmd
quanteda/vignettes/plotting.Rmd
quanteda/vignettes/mystyle.css
quanteda/vignettes/LitVignette.Rmd
quanteda/README.md
quanteda/MD5
quanteda/build
quanteda/build/vignette.rds
quanteda/DESCRIPTION
quanteda/man
quanteda/man/dfm_select.Rd quanteda/man/tokens.Rd quanteda/man/nsyllable.Rd quanteda/man/segment.Rd quanteda/man/textmodel_wordfish.Rd quanteda/man/textmodel_wordscores.Rd quanteda/man/is.dfm.Rd quanteda/man/docnames.Rd quanteda/man/changeunits.Rd quanteda/man/textmodel-internal.Rd quanteda/man/print.dfm.Rd quanteda/man/weight.Rd quanteda/man/tfidf.Rd quanteda/man/sparsity.Rd quanteda/man/nscrabble.Rd quanteda/man/char_tolower.Rd quanteda/man/tokens_ngrams.Rd quanteda/man/data_char_ukimmig2010.Rd quanteda/man/selectFeaturesOLD.Rd quanteda/man/features.Rd quanteda/man/data-internal.Rd quanteda/man/metacorpus.Rd quanteda/man/sort.dfm.Rd quanteda/man/similarity.Rd quanteda/man/textmodel.Rd quanteda/man/tokens_compound.Rd quanteda/man/data_dfm_LBGexample.Rd quanteda/man/textmodel_fitted-class.Rd quanteda/man/is.dictionary.Rd quanteda/man/corpuszip.Rd quanteda/man/dfm_trim.Rd quanteda/man/ntoken.Rd quanteda/man/textmodel_ca.Rd quanteda/man/data_char_sampletext.Rd quanteda/man/dfm_sample.Rd quanteda/man/vector2list.Rd quanteda/man/kwic_old.Rd quanteda/man/convert.Rd quanteda/man/textplot_wordcloud.Rd quanteda/man/sequences.Rd quanteda/man/dfm.Rd quanteda/man/subset.corpus.Rd quanteda/man/convert-wrappers.Rd quanteda/man/View.Rd quanteda/man/topfeatures.Rd quanteda/man/kwic.Rd quanteda/man/textplot_scale1d.Rd quanteda/man/corpus_sample.Rd quanteda/man/fcm-class.Rd quanteda/man/dfm_tolower.Rd quanteda/man/tokens_tolower.Rd quanteda/man/corpus_reshape.Rd quanteda/man/kwic_split.Rd quanteda/man/data-deprecated.Rd quanteda/man/tokens_select.Rd quanteda/man/plot-deprecated.Rd quanteda/man/textfile.Rd quanteda/man/texts.Rd quanteda/man/data_corpus_inaugural.Rd quanteda/man/dfm2lsa.Rd quanteda/man/valuetype.Rd quanteda/man/textmodel_wordshoal.Rd quanteda/man/wordstem.Rd quanteda/man/dictionary.Rd quanteda/man/featnames.Rd quanteda/man/cbind.dfm.Rd quanteda/man/textstat_simil.Rd quanteda/man/corpus.Rd quanteda/man/dfm-class.Rd quanteda/man/corpus_segment.Rd quanteda/man/phrasetotoken.Rd quanteda/man/as.matrix.dfm.Rd quanteda/man/ngrams.Rd quanteda/man/corpus_subset.Rd quanteda/man/corpus-class.Rd quanteda/man/dictionary-class.Rd quanteda/man/nsentence.Rd quanteda/man/textmodel_NB.Rd quanteda/man/head.dfm.Rd quanteda/man/sequence2list.Rd quanteda/man/textplot_xray.Rd quanteda/man/data_char_mobydick.Rd quanteda/man/joinTokens.Rd quanteda/man/summary.corpus.Rd quanteda/man/deprecated-textstat.Rd quanteda/man/metadoc.Rd quanteda/man/trim.Rd quanteda/man/dfm_lookup.Rd quanteda/man/syllables.Rd quanteda/man/dfm_compress.Rd quanteda/man/textstat_lexdiv.Rd quanteda/man/settings.Rd quanteda/man/is.collocations.Rd quanteda/man/tokens_hash.Rd quanteda/man/tf.Rd quanteda/man/tokenize.Rd quanteda/man/docvars.Rd quanteda/man/fcm.Rd quanteda/man/ndoc.Rd quanteda/man/as.corpus.corpuszip.Rd quanteda/man/removeFeatures.Rd quanteda/man/dfm_weight.Rd quanteda/man/collocations.Rd quanteda/man/reassign_attributes.Rd quanteda/man/tokens_wordstem.Rd quanteda/man/docfreq.Rd quanteda/man/dfm_sort.Rd quanteda/man/as.list.dist.Rd quanteda/man/predict.textmodel.Rd quanteda/man/fcm_sort.Rd quanteda/man/applyDictionary.Rd quanteda/man/scrabble.Rd quanteda/man/toLower.Rd quanteda/man/sample.Rd quanteda/man/as.tokens.Rd quanteda/man/stopwords.Rd quanteda/man/data_corpus_irishbudget2010.Rd quanteda/man/textstat_readability.Rd quanteda/man/as.corpus.Rd quanteda/man/quanteda-package.Rd quanteda/man/compress.Rd quanteda/man/selectFeatures.Rd quanteda/man/tokens_lookup.Rd

Questions? Problems? Suggestions? or email at ian@mutexlabs.com.

All documentation is copyright its authors; we didn't write any of that.