quanteda: Quantitative Analysis of Textual Data

A fast, flexible framework for for the management, processing, and quantitative analysis of textual data in R.

Install the latest version of this package by entering the following in R:
AuthorKenneth Benoit [aut, cre, cph], Kohei Watanabe [ctb], Paul Nulty [ctb], Adam Obeng [ctb], Haiyan Wang [ctb], Benjamin Lauderdale [ctb], Will Lowe [ctb]
Date of publication2017-04-20 09:07:57 UTC
MaintainerKenneth Benoit <kbenoit@lse.ac.uk>

View on CRAN

Man pages

applyDictionary: apply a dictionary or thesaurus to an object

as.corpus: coerce a compressed corpus to a standard corpus

as.corpus.corpuszip: coerce a compressed corpus to a standard corpus

as.dist.dist: coerce a dist into a dist

as.list.dist: coerce a dist object into a list

as.list.dist_selection: coerce a dist_selection object into a list

as.matrix.dfm: coerce a dfm to a matrix or data.frame

as.matrix.dist_selection: coerce a dist_selection object to a matrix

as.matrix.simil: Coerce a simil object into a matrix

as.tokens: coercion and checking functions for tokens objects

as.yaml: convert quanteda dictionary objects to the YAML format

attributes-set: R-like alternative to reassign_attributes()

bootstrap_dfm: bootstrap a dfm

cbind.dfm: Combine dfm objects by Rows or Columns

changeunits: deprecated name for corpus_reshape

char_tolower: convert the case of character objects

coef.textmodel: extract text model coefficients

collocations: detect collocations from text

collocations2: detect collocations from text

compress: compress a dfm by combining similarly named dimensions

convert: convert a dfm to a non-quanteda format

convert-wrappers: convenience wrappers for dfm convert

corpus: construct a corpus object

corpus-class: base method extensions for corpus objects

corpus_reshape: recast the document units of a corpus

corpus_sample: randomly sample documents from a corpus

corpus_segment: segment texts into component elements

corpus_subset: extract a subset of a corpus

corpus_trimsentences: remove sentences based on their token lengths or a pattern...

data_char_sampletext: a paragraph of text for testing various text-based functions

data_char_ukimmig2010: immigration-related sections of 2010 UK party manifestos

data_corpus_inaugural: US presidential inaugural address texts

data_corpus_irishbudget2010: Irish budget speeches from 2010

data-deprecated: datasets with deprecated or defunct names

data_dfm_LBGexample: dfm from data in Table 1 of Laver, Benoit, and Garry (2003)

data-internal: internal data sets

deprecate_argument: issue warning for deprecrated function arguments

deprecated-textstat: deprecated textstat names

dfm: create a document-feature matrix

dfm2lsa: convert a dfm to an lsa "textmatrix"

dfm-class: Virtual class "dfm" for a document-feature matrix

dfm_compress: compress a dfm or fcm by combining identical dimension...

dfm-internal: internal functions for dfm objects

dfm_lookup: apply a dictionary to a dfm

dfm_sample: randomly sample documents or features from a dfm

dfm_select: select features from a dfm or fcm

dfm_sort: sort a dfm by frequency of one or more margins

dfm_tolower: convert the case of the features of a dfm and combine

dfm_trim: trim a dfm using frequency threshold-based feature selection

dfm_weight: weight the feature frequencies in a dfm

dictionary: create a dictionary

dictionary-class: print a dictionary object

docfreq: compute the (weighted) document frequency of a feature

docnames: get or set document names

docvars: get or set for document-level variables

fcm: create a feature co-occurrence matrix

fcm-class: Virtual class "fcm" for a feature co-occurrence matrix

fcm_sort: sort an fcm in alphabetical order of the features

featnames: get the feature labels from a dfm

features: deprecated function name for featnames

features2list: convert various input as features to a simple list

features2vector: convert various input as features to a vector

head.dfm: return the first or last part of a dfm

is.dfm: coercion and checking functions for dfm objects

is.dictionary: check if an object is a dictionary

joinTokens: join tokens function

keyness: compute keyness (internal functions)

kwic: locate keywords-in-context

metacorpus: get or set corpus metadata

metadoc: get or set document-level meta-data

ndoc: count the number of documents or features

ngrams: deprecated function name for forming ngrams and skipgrams

nscrabble: count the Scrabble letter values of text

nsentence: count the number of sentences

nsyllable: count syllables in a text

ntoken: count the number of tokens or types

phrasetotoken: convert phrases into single tokens

plot-deprecated: deprecated plotting functions

predict.textmodel: prediction method for Naive Bayes classifier objects

print.dfm: print a dfm object

print.dist_selection: print a dist_selection object

quanteda_options: get or set package options for quanteda

quanteda-package: An R package for the quantitative analysis of textual data

removeFeatures: remove features from an object

sample: randomly sample documents or features

scrabble: deprecated name for nscrabble

segment: segment: deprecated function

selectFeatures: select features from an object

selectFeaturesOLD: old version of selectFeatures.tokenizedTexts

sequences: find variable-length collocations with filtering

settings: Get or set the corpus settings

similarity: compute similarities between documents and/or features

sort.dfm: sort a dfm by one or more margins

sparsity: compute the sparsity of a document-feature matrix

stopwords: access built-in stopwords

subset.corpus: deprecated name for corpus_subset

summary.corpus: summarize a corpus or a vector of texts

syllables: deprecated name for nsyllable

textfile: old function to read texts from files

textmodel: fit a text model

textmodel_ca: correspondence analysis of a document-feature matrix

textmodel_fitted-class: the fitted textmodel classes

textmodel-internal: internal functions for textmodel objects

textmodel_NB: Naive Bayes classifier for texts

textmodel_wordfish: wordfish text model

textmodel_wordscores: Wordscores text model

textmodel_wordshoal: wordshoal text model

textplot_scale1d: plot a fitted scaling model

textplot_wordcloud: plot features as a wordcloud

textplot_xray: plot the dispersion of key word(s)

texts: get or assign corpus texts

textstat_collocations: calculate collocation statistics

textstat_keyness: calculate keyness statistics

textstat_lexdiv: calculate lexical diversity

textstat_readability: calculate readability

textstat_simil: Similarity and distance computation between documents or...

tf: compute (weighted) term frequency from a dfm

tfidf: compute tf-idf weights from a dfm

tokenize: tokenize a set of texts

tokens: tokenize a set of texts

tokens_compound: convert token sequences into compound tokens

tokens_hash: Function to hash list-of-character tokens

tokens_hashed_recompile: recompile a hashed tokens object

tokens_lookup: apply a dictionary to a tokens object

tokens_ngrams: create ngrams and skipgrams from tokens

tokens_select: select or remove tokens from a tokens object

tokens_tolower: convert the case of tokens

tokens_wordstem: stem the terms in an object

toLower: Convert texts to lower (or upper) case

topfeatures: list the most frequent features

trim: deprecated name for dfm_trim

valuetype: pattern matching using valuetype

View: View methods for quanteda

weight: weight or smooth a dfm

wordstem: stem words


applyDictionary Man page
applyDictionary.dfm Man page
applyDictionary.tokenizedTexts Man page
applyDictionary.tokens Man page
as.character.corpus Man page
as.character.tokens Man page
as.corpus Man page
as.corpus.corpuszip Man page
as.data.frame.dfm Man page
as.dfm Man page
as.dist.dist Man page
as.DocumentTermMatrix Man page
as.list,dictionary-method Man page
as.list.dist Man page
as.list.dist_selection Man page
as.list.tokens Man page
as.matrix.dfm Man page
as.matrix.dist_selection Man page
as.matrix.simil Man page
as.matrix.similMatrix Man page
as.tokenizedTexts Man page
as.tokenizedTexts.list Man page
as.tokenizedTexts.tokens Man page
as.tokens Man page
as.tokens.collocations Man page
as.tokens.kwic Man page
as.tokens.list Man page
as.wfm Man page
as.yaml Man page
attributes<- Man page
bootstrap_dfm Man page
cbind.dfm Man page
c.corpus Man page
changeunits Man page
char_ngrams Man page
char_segment Man page
char_tolower Man page
char_toupper Man page
char_trimsentences Man page
char_wordstem Man page
coefficients,textmodel_ca_fitted-method Man page
coefficients,textmodel_wordfish_fitted-method Man page
coefficients,textmodel_wordscores_fitted-method Man page
coefficients,textmodel_wordscores_predicted-method Man page
coef.textmodel Man page
coef,textmodel_ca_fitted-method Man page
coef,textmodel_wordfish_fitted-method Man page
coef,textmodel_wordscores_fitted-method Man page
coef,textmodel_wordscores_predicted-method Man page
collocations Man page
collocations2 Man page
colMeans,dfmSparse-method Man page
colSums,dfmSparse-method Man page
Compare,dfmSparse,numeric-method Man page
compress Man page
compress.dfm Man page
convert Man page
convert-wrappers Man page
corpus Man page
[.corpus Man page
[[<-.corpus Man page
[[.corpus Man page
+.corpus Man page
corpus-class Man page
corpus_reshape Man page
corpus_sample Man page
corpus_segment Man page
corpus_subset Man page
corpus_trimsentences Man page
c.tokens Man page
data_char_inaugural Man page
data_char_inaugural Man page
data_char_mobydick Man page
data_char_sampletext Man page
data_char_stopwords Man page
data_char_ukimmig2010 Man page
data_char_wordlists Man page
data_corpus_inaugural Man page
data_corpus_irishbudget2010 Man page
data-deprecated Man page
data_dfm_LBGexample Man page
data-internal Man page
data_int_syllables Man page
deprecate_argument Man page
deprecated-textstat Man page
dfm Man page
dfm2ldaformat Man page
dfm2lsa Man page
dfm-class Man page
dfm_compress Man page
dfmDense-class Man page
+,dfmDense,numeric-method Man page
[,dfm,index,index,logical-method Man page
[,dfm,index,index,missing-method Man page
[,dfm,index,missing,logical-method Man page
[,dfm,index,missing,missing-method Man page
dfm-internal Man page
dfm_lookup Man page
[,dfm,missing,index,logical-method Man page
[,dfm,missing,index,missing-method Man page
[,dfm,missing,missing,logical-method Man page
[,dfm,missing,missing,missing-method Man page
dfm_remove Man page
dfm_sample Man page
dfm_select Man page
dfm_smooth Man page
dfm_sort Man page
dfmSparse-class Man page
+,dfmSparse,numeric-method Man page
dfm_tolower Man page
dfm_toupper Man page
dfm_trim Man page
dfm_weight Man page
dfm_wordstem Man page
dictionary Man page
dictionary-class Man page
[,dictionary,index,ANY,ANY-method Man page
[[,dictionary,index-method Man page
docfreq Man page
docnames Man page
docnames<- Man page
docvars Man page
docvars<- Man page
exampleString Man page
fcm Man page
fcm-class Man page
fcm_compress Man page
fcm_remove Man page
fcm_select Man page
fcm_sort Man page
fcm_tolower Man page
fcm_toupper Man page
featnames Man page
features Man page
features2list Man page
features2vector Man page
head.dfm Man page
ie2010Corpus Man page
inaugCorpus Man page
inaugTexts Man page
is.collocations Man page
is.corpus Man page
is.corpuszip Man page
is.dfm Man page
is.dictionary Man page
is.fcm Man page
is.kwic Man page
is.sequences Man page
is.tokenizedTexts Man page
is.tokens Man page
joinTokens Man page
keyness Man page
keyness_chi2_dt Man page
keyness_chi2_stats Man page
keyness_exact Man page
keyness_lr Man page
kwic Man page
LBGexample Man page
lexdiv Man page
metacorpus Man page
metacorpus<- Man page
metadoc Man page
metadoc<- Man page
ndoc Man page
nfeature Man page
ngrams Man page
ngrams.default Man page
ngrams.tokenizedTexts Man page
nscrabble Man page
nsentence Man page
nsyllable Man page
ntoken Man page
ntype Man page
+,numeric,dfmDense-method Man page
+,numeric,dfmSparse-method Man page
phrasetotoken Man page
phrasetotoken,character,character-method Man page
phrasetotoken,corpus,ANY-method Man page
phrasetotoken,textORtokens,collocations-method Man page
phrasetotoken,textORtokens,dictionary-method Man page
phrasetotoken,tokenizedTexts,character-method Man page
plot-deprecated Man page
plot.dfm Man page
plot.kwic Man page
plot.textmodel_wordfish_fitted Man page
predict.textmodel_NB_fitted Man page
predict.textmodel_wordscores_fitted Man page
print.corpus Man page
print.dfm Man page
print,dfm-method Man page
print.dist_selection Man page
print,fcm-method Man page
print.textmodel_wordfish_fitted Man page
print.textmodel_wordscores_fitted Man page
print.textmodel_wordshoal_fitted Man page
quanteda Man page
quantedaformat2dtm Man page
quanteda_options Man page
quanteda-package Man page
rbind.dfm Man page
readability Man page
removeFeatures Man page
rowMeans,dfmSparse-method Man page
rowSums,dfmSparse-method Man page
sample Man page
sample.corpus Man page
sample.default Man page
sample.dfm Man page
scrabble Man page
segment Man page
segment.character Man page
segment.corpus Man page
selectFeatures Man page
selectFeatures.collocations Man page
selectFeatures.dfm Man page
selectFeaturesOLD Man page
selectFeaturesOLD.tokenizedTexts Man page
selectFeatures.tokenizedTexts Man page
selectFeatures.tokens Man page
sequences Man page
settings Man page
settings<- Man page
settings.corpus Man page
settings.default Man page
settings.dfm Man page
show,dfm-method Man page
show,dictionary-method Man page
show,fcm-method Man page
show,textmodel_wordfish_fitted-method Man page
show,textmodel_wordfish_predicted-method Man page
show,textmodel_wordscores_fitted-method Man page
show,textmodel_wordscores_predicted-method Man page
show,textmodel_wordshoal_fitted-method Man page
show,textmodel_wordshoal_predicted-method Man page
similarity Man page
similarity,dfm-method Man page
skipgrams Man page
smoother Man page
sort.dfm Man page
sparsity Man page
stopwords Man page
str.corpus Man page
subset.corpus Man page
summary.character Man page
summary.corpus Man page
syllables Man page
tail.dfm Man page
t,dfmDense-method Man page
t,dfmSparse-method Man page
textfile Man page
textmodel Man page
textmodel_ca Man page
textmodel_ca_fitted-class Man page
textmodel,dfm,ANY,missing,character-method Man page
textmodel_fitted-class Man page
textmodel,formula,missing,dfm,character-method Man page
textmodel-internal Man page
textmodel-internal Man page
textmodel_NB Man page
textmodel_wordfish Man page
textmodel_wordfish_fitted-class Man page
textmodel_wordfish_predicted-class Man page
textmodel_wordscores Man page
textmodel_wordscores_fitted-class Man page
textmodel_wordscores_predicted-class Man page
textmodel_wordshoal Man page
textmodel_wordshoal_fitted-class Man page
textmodel_wordshoal_predicted-class Man page
textplot_scale1d Man page
textplot_wordcloud Man page
textplot_xray Man page
texts Man page
texts<- Man page
textstat_collocations Man page
textstat_dist Man page
textstat_keyness Man page
textstat_lexdiv Man page
textstat_readability Man page
textstat_simil Man page
tf Man page
tfidf Man page
tokenise Man page
tokenize Man page
tokenize.character Man page
tokenize.corpus Man page
tokens Man page
+.tokens Man page
tokens_compound Man page
tokens_hash Man page
tokens_hashed_recompile Man page
tokens_lookup Man page
tokens_ngrams Man page
tokens_remove Man page
tokens_select Man page
tokens_skipgrams Man page
tokens_tolower Man page
tokens_toupper Man page
tokens_wordstem Man page
toLower Man page
toLower.character Man page
toLower.corpus Man page
toLower.NULL Man page
toLower.tokenizedTexts Man page
toLower.tokens Man page
topfeatures Man page
toUpper Man page
toUpper.character Man page
toUpper.corpus Man page
toUpper.NULL Man page
toUpper.tokenizedTexts Man page
toUpper.tokens Man page
trim Man page
trimdfm Man page
trim.dfm Man page
ukimmigTexts Man page
valuetype Man page
View Man page
View.default Man page
View.dfmSparse Man page
View.kwic Man page
weight Man page
wordstem Man page
wordstem.character Man page
wordstem.dfm Man page
wordstem.tokenizedTexts Man page
wordstem.tokens Man page


tests/testthat/test-indexing.R tests/testthat/test-utils.R
tests/testthat/test-textstat_readability.R tests/testthat/test_dfm-compress.R tests/testthat/test-corpus_segment.R tests/testthat/test-textmodel_NB.R tests/testthat/test-nfunctions.R tests/testthat/test-textstat_simil.R tests/testthat/test-bootstrap.R tests/testthat/test-regex2fixed.R tests/testthat/test-corpus.R tests/testthat/test-as.dfm.R tests/testthat/test-fcm_methods.R tests/testthat/test-corpus_reshape.R tests/testthat/test-textstat_collocations.R tests/testthat/test-tokens_hashed_recompile.R tests/testthat/test-tolower.R tests/testthat/test-fcm.R tests/testthat/test-quanteda_options.R tests/testthat/test-corpus_trimsentences.R tests/testthat/test-convert.R tests/testthat/test-tokens_compound.R tests/testthat/test-textmodel_ca.R tests/testthat/test-dfm_select.R tests/testthat/test-texts.R tests/testthat/test-textstat_keyness.R tests/testthat/test-tokens.R tests/testthat/test-dfm_lookup.R tests/testthat/test-corpus-compress.R tests/testthat/test-textmodel_wordscores.R tests/testthat/test-plots.R tests/testthat/test-dictionaries.R tests/testthat/test-tokens_ngrams.R tests/testthat/test_tokenizer.R tests/testthat/test-collocations.R tests/testthat/test-stopwords.R tests/testthat/test-corpus_sample.R tests/testthat/test-selectFeatures.R tests/testthat/test-textmodel_wordfish.R tests/testthat/test-textstat_dist.R tests/testthat/test-docvars.R tests/testthat/test-phrases.R tests/testthat/test_similarity.R tests/testthat/test-dfm.R tests/testthat/test-sequences.R tests/testthat/test-kwic.R tests/testthat/test-tokens_select.R tests/testthat/test-wordstem.R tests/testthat/test-tokens_lookup.R
R/convert.R R/regex2fixed.R R/corpus-methods-base.R R/nfunctions.R R/utils.R R/dfm_select.R R/collocations.R R/textmodel_wordscores.R R/textstat_keyness.R R/textplot_xray.R R/selectFeatures-old.R R/dfm-classes.R R/joinTokens-deprecated.R R/readtext-methods.R R/stopwords.R R/textstat_dist.R R/textstat-deprecated.R R/tokenize.R R/bootstrap_dfm.R R/dfm_compress.R R/plots-deprecated.R R/textplot_scale1d.R R/dfm_trim.R R/resample.R R/character-methods.R R/data-deprecated.R R/kwic.R R/quanteda.R R/docvars.R R/View.R R/corpus-deprecated.R R/phrases.R R/textstat_readability.R R/corpus.R R/textmodel-internal.R R/nsyllable.R R/dfm-deprecated.R R/tokens.R R/tokens_lookup.R R/corpus-methods-quanteda.R R/dfm_lookup.R R/data-documentation.R R/corpus_reshape.R R/tokens_ngrams.R R/selectFeatures.R R/dfm_weight.R R/toLower.R R/tokenize_outtakes.R R/textstat_lexdiv.R R/textmodel_wordfish.R R/settings.R R/RcppExports.R R/fcm-methods.R R/nscrabble.R R/corpus_trimsentences.R R/textmodel_wordshoal.R R/quanteda_options.R R/corpus_sample.R R/tokens_select.R R/sequences.R R/textmodel-generics.R R/textstat_collocations.R R/corpus_segment.R R/textmodel_NB.R R/textplot_wordcloud.R R/dfm-print.R R/collocations2.R R/similarity.R R/dictionaries-deprecated.R R/dictionaries.R R/wordstem.R R/textstat_simil.R R/corpuszip.R R/dfm-subsetting.R R/fcm.R R/corpus_subset.R R/zzz.R R/tolower-misc.R R/dfm_sample.R R/valuetype.R R/dfm-methods.R R/tokens_compound.R R/textmodel_ca.R R/dfm.R
man/dfm_select.Rd man/tokens.Rd man/nsyllable.Rd man/segment.Rd man/textmodel_wordfish.Rd man/textmodel_wordscores.Rd man/is.dfm.Rd man/docnames.Rd man/changeunits.Rd man/textmodel-internal.Rd man/textstat_keyness.Rd man/print.dfm.Rd man/weight.Rd man/tfidf.Rd man/sparsity.Rd man/nscrabble.Rd
man/char_tolower.Rd man/dfm-internal.Rd man/tokens_ngrams.Rd man/data_char_ukimmig2010.Rd man/selectFeaturesOLD.Rd man/features.Rd man/data-internal.Rd man/metacorpus.Rd man/sort.dfm.Rd man/similarity.Rd man/corpus_trimsentences.Rd man/textmodel.Rd man/tokens_compound.Rd man/data_dfm_LBGexample.Rd man/textstat_collocations.Rd man/textmodel_fitted-class.Rd man/is.dictionary.Rd man/dfm_trim.Rd man/ntoken.Rd man/textmodel_ca.Rd man/as.matrix.simil.Rd man/data_char_sampletext.Rd man/dfm_sample.Rd man/convert.Rd man/textplot_wordcloud.Rd man/sequences.Rd man/dfm.Rd man/subset.corpus.Rd man/convert-wrappers.Rd man/View.Rd man/topfeatures.Rd man/as.list.dist_selection.Rd man/kwic.Rd man/textplot_scale1d.Rd man/corpus_sample.Rd man/fcm-class.Rd man/dfm_tolower.Rd man/tokens_tolower.Rd man/corpus_reshape.Rd man/data-deprecated.Rd man/tokens_select.Rd man/plot-deprecated.Rd man/textfile.Rd man/texts.Rd man/data_corpus_inaugural.Rd man/dfm2lsa.Rd man/valuetype.Rd man/textmodel_wordshoal.Rd man/wordstem.Rd man/dictionary.Rd man/attributes-set.Rd man/features2vector.Rd man/quanteda_options.Rd man/featnames.Rd man/cbind.dfm.Rd man/textstat_simil.Rd man/corpus.Rd man/dfm-class.Rd man/corpus_segment.Rd man/phrasetotoken.Rd man/as.matrix.dfm.Rd man/ngrams.Rd man/corpus_subset.Rd man/corpus-class.Rd man/dictionary-class.Rd man/bootstrap_dfm.Rd man/nsentence.Rd man/textmodel_NB.Rd man/head.dfm.Rd man/features2list.Rd man/textplot_xray.Rd man/joinTokens.Rd man/collocations2.Rd man/summary.corpus.Rd man/deprecated-textstat.Rd man/metadoc.Rd man/trim.Rd man/as.matrix.dist_selection.Rd man/dfm_lookup.Rd man/syllables.Rd man/dfm_compress.Rd man/textstat_lexdiv.Rd man/settings.Rd man/tokens_hash.Rd man/tf.Rd man/tokenize.Rd man/docvars.Rd man/fcm.Rd man/ndoc.Rd man/as.corpus.corpuszip.Rd man/removeFeatures.Rd man/dfm_weight.Rd man/as.dist.dist.Rd man/collocations.Rd man/tokens_wordstem.Rd man/docfreq.Rd man/dfm_sort.Rd man/as.list.dist.Rd man/predict.textmodel.Rd man/fcm_sort.Rd man/keyness.Rd man/applyDictionary.Rd man/deprecate_argument.Rd man/scrabble.Rd man/toLower.Rd man/print.dist_selection.Rd man/tokens_hashed_recompile.Rd man/as.yaml.Rd man/sample.Rd man/as.tokens.Rd man/coef.textmodel.Rd man/stopwords.Rd man/data_corpus_irishbudget2010.Rd man/textstat_readability.Rd man/as.corpus.Rd man/quanteda-package.Rd man/compress.Rd man/selectFeatures.Rd man/tokens_lookup.Rd

Questions? Problems? Suggestions? or email at ian@mutexlabs.com.

Please suggest features or report bugs with the GitHub issue tracker.

All documentation is copyright its authors; we didn't write any of that.