quanteda: Quantitative Analysis of Textual Data

A fast, flexible framework for for the management, processing, and quantitative analysis of textual data in R.

AuthorKenneth Benoit [aut, cre, cph], Kohei Watanabe [ctb], Paul Nulty [ctb], Adam Obeng [ctb], Haiyan Wang [ctb], Benjamin Lauderdale [ctb], Will Lowe [ctb]
Date of publication2017-04-20 09:07:57 UTC
MaintainerKenneth Benoit <kbenoit@lse.ac.uk>

Man pages

applyDictionary: apply a dictionary or thesaurus to an object

as.corpus: coerce a compressed corpus to a standard corpus

as.corpus.corpuszip: coerce a compressed corpus to a standard corpus

as.dist.dist: coerce a dist into a dist

as.list.dist: coerce a dist object into a list

as.list.dist_selection: coerce a dist_selection object into a list

as.matrix.dfm: coerce a dfm to a matrix or data.frame

as.matrix.dist_selection: coerce a dist_selection object to a matrix

as.matrix.simil: Coerce a simil object into a matrix

as.tokens: coercion and checking functions for tokens objects

as.yaml: convert quanteda dictionary objects to the YAML format

attributes-set: R-like alternative to reassign_attributes()

bootstrap_dfm: bootstrap a dfm

cbind.dfm: Combine dfm objects by Rows or Columns

changeunits: deprecated name for corpus_reshape

char_tolower: convert the case of character objects

coef.textmodel: extract text model coefficients

collocations: detect collocations from text

collocations2: detect collocations from text

compress: compress a dfm by combining similarly named dimensions

convert: convert a dfm to a non-quanteda format

convert-wrappers: convenience wrappers for dfm convert

corpus: construct a corpus object

corpus-class: base method extensions for corpus objects

corpus_reshape: recast the document units of a corpus

corpus_sample: randomly sample documents from a corpus

corpus_segment: segment texts into component elements

corpus_subset: extract a subset of a corpus

corpus_trimsentences: remove sentences based on their token lengths or a pattern...

data_char_sampletext: a paragraph of text for testing various text-based functions

data_char_ukimmig2010: immigration-related sections of 2010 UK party manifestos

data_corpus_inaugural: US presidential inaugural address texts

data_corpus_irishbudget2010: Irish budget speeches from 2010

data-deprecated: datasets with deprecated or defunct names

data_dfm_LBGexample: dfm from data in Table 1 of Laver, Benoit, and Garry (2003)

data-internal: internal data sets

deprecate_argument: issue warning for deprecrated function arguments

deprecated-textstat: deprecated textstat names

dfm: create a document-feature matrix

dfm2lsa: convert a dfm to an lsa "textmatrix"

dfm-class: Virtual class "dfm" for a document-feature matrix

dfm_compress: compress a dfm or fcm by combining identical dimension...

dfm-internal: internal functions for dfm objects

dfm_lookup: apply a dictionary to a dfm

dfm_sample: randomly sample documents or features from a dfm

dfm_select: select features from a dfm or fcm

dfm_sort: sort a dfm by frequency of one or more margins

dfm_tolower: convert the case of the features of a dfm and combine

dfm_trim: trim a dfm using frequency threshold-based feature selection

dfm_weight: weight the feature frequencies in a dfm

dictionary: create a dictionary

dictionary-class: print a dictionary object

docfreq: compute the (weighted) document frequency of a feature

docnames: get or set document names

docvars: get or set for document-level variables

fcm: create a feature co-occurrence matrix

fcm-class: Virtual class "fcm" for a feature co-occurrence matrix

fcm_sort: sort an fcm in alphabetical order of the features

featnames: get the feature labels from a dfm

features: deprecated function name for featnames

features2list: convert various input as features to a simple list

features2vector: convert various input as features to a vector

head.dfm: return the first or last part of a dfm

is.dfm: coercion and checking functions for dfm objects

is.dictionary: check if an object is a dictionary

joinTokens: join tokens function

keyness: compute keyness (internal functions)

kwic: locate keywords-in-context

metacorpus: get or set corpus metadata

metadoc: get or set document-level meta-data

ndoc: count the number of documents or features

ngrams: deprecated function name for forming ngrams and skipgrams

nscrabble: count the Scrabble letter values of text

nsentence: count the number of sentences

nsyllable: count syllables in a text

ntoken: count the number of tokens or types

phrasetotoken: convert phrases into single tokens

plot-deprecated: deprecated plotting functions

predict.textmodel: prediction method for Naive Bayes classifier objects

print.dfm: print a dfm object

print.dist_selection: print a dist_selection object

quanteda_options: get or set package options for quanteda

quanteda-package: An R package for the quantitative analysis of textual data

removeFeatures: remove features from an object

sample: randomly sample documents or features

scrabble: deprecated name for nscrabble

segment: segment: deprecated function

selectFeatures: select features from an object

selectFeaturesOLD: old version of selectFeatures.tokenizedTexts

sequences: find variable-length collocations with filtering

settings: Get or set the corpus settings

similarity: compute similarities between documents and/or features

sort.dfm: sort a dfm by one or more margins

sparsity: compute the sparsity of a document-feature matrix

stopwords: access built-in stopwords

subset.corpus: deprecated name for corpus_subset

summary.corpus: summarize a corpus or a vector of texts

syllables: deprecated name for nsyllable

textfile: old function to read texts from files

textmodel: fit a text model

textmodel_ca: correspondence analysis of a document-feature matrix

textmodel_fitted-class: the fitted textmodel classes

textmodel-internal: internal functions for textmodel objects

textmodel_NB: Naive Bayes classifier for texts

textmodel_wordfish: wordfish text model

textmodel_wordscores: Wordscores text model

textmodel_wordshoal: wordshoal text model

textplot_scale1d: plot a fitted scaling model

textplot_wordcloud: plot features as a wordcloud

textplot_xray: plot the dispersion of key word(s)

texts: get or assign corpus texts

textstat_collocations: calculate collocation statistics

textstat_keyness: calculate keyness statistics

textstat_lexdiv: calculate lexical diversity

textstat_readability: calculate readability

textstat_simil: Similarity and distance computation between documents or...

tf: compute (weighted) term frequency from a dfm

tfidf: compute tf-idf weights from a dfm

tokenize: tokenize a set of texts

tokens: tokenize a set of texts

tokens_compound: convert token sequences into compound tokens

tokens_hash: Function to hash list-of-character tokens

tokens_hashed_recompile: recompile a hashed tokens object

tokens_lookup: apply a dictionary to a tokens object

tokens_ngrams: create ngrams and skipgrams from tokens

tokens_select: select or remove tokens from a tokens object

tokens_tolower: convert the case of tokens

tokens_wordstem: stem the terms in an object

toLower: Convert texts to lower (or upper) case

topfeatures: list the most frequent features

trim: deprecated name for dfm_trim

valuetype: pattern matching using valuetype

View: View methods for quanteda

weight: weight or smooth a dfm

wordstem: stem words


