Man pages for corpustools
Managing, Querying and Analyzing Tokenized Text

add_multitoken_labelChoose and add multitoken strings based on multitoken...
agg_labelHelper function for aggregate_rsyntax
aggregate_rsyntaxAggregate rsyntax annotations
agg_tcorpusAggregate the tokens data
as.tcorpusForce an object to be a tCorpus class
as.tcorpus.defaultForce an object to be a tCorpus class
as.tcorpus.tCorpusForce an object to be a tCorpus class
backbone_filterExtract the backbone of a network.
browse_hitsView hits in a browser
browse_textsCreate and view a full text browser
calc_chi2Vectorized computation of chi^2 statistic for a 2x2 crosstab...
compare_corpusCompare tCorpus vocabulary to that of another (reference)...
compare_documentsCalculate the similarity of documents
compare_subsetCompare vocabulary of a subset of a tCorpus to the rest of...
corenlp_tokenscoreNLP example sentences
count_tcorpusCount results of search hits, or of a given feature in tokens
create_tcorpusCreate a tCorpus
docfreq_filterSupport function for subset method
dtm_compareCompare two document term matrices
dtm_wordcloudPlot a word cloud from a dtm
ego_semnetCreate an ego network
export_span_annotationsExport span annotations
feature_associationsGet common nearby features given a query or query hits
feature_statsFeature statistics
fold_rsyntaxFold rsyntax annotations
freq_filterSupport function for subset method
get_dtmCreate a document term matrix.
get_global_iCompute global feature positions
get_kwicGet keyword-in-context (KWIC) strings
get_stopwordsGet a character vector of stopwords
laplaceLaplace (i.e. add constant) smoothing
melt_quanteda_dictConvert a quanteda dictionary to a long data.table format
merge_tcorporaMerge tCorpus objects
plot.contextHitsS3 plot for contextHits class
plot.featureAssociationsvisualize feature associations
plot.featureHitsS3 plot for featureHits class
plot_semnetVisualize a semnet network
plot.vocabularyComparisonvisualize vocabularyComparison
plot_wordsPlot a wordcloud with words ordered and coloured according to...
preprocess_tokensPreprocess tokens in a character vector
print.contextHitsS3 print for contextHits class
print.featureHitsS3 print for featureHits class
print.tCorpusS3 print for tCorpus class
refresh_tcorpusRefresh a tCorpus object using the current version of...
require_packageCheck if package with given version exists
search_contextsSearch for documents or sentences using Boolean queries
search_dictionaryDictionary lookup
search_featuresFind tokens using a Lucene-like search query
semnetCreate a semantic network based on the co-occurence of tokens...
semnet_windowCreate a semantic network based on the co-occurence of tokens...
set_network_attributesSet some default network attributes for pretty plotting
sgtSimple Good Turing smoothing
show_udpipe_modelsShow the names of udpipe models
sotu_textsState of the Union addresses
stopwords_listBasic stopword lists
subset_querySubset tCorpus token data using a query
subset.tCorpusS3 subset for tCorpus class
summary.contextHitsS3 summary for contextHits class
summary.featureHitsS3 summary for featureHits class
summary.tCorpusSummary of a tCorpus object
tCorpustCorpus: a corpus class for tokenized texts
tCorpus-cash-annotate_rsyntaxAnnotate tokens based on rsyntax queries
tCorpus-cash-code_dictionaryDictionary lookup
tCorpus-cash-code_featuresCode features in a tCorpus based on a search string
tCorpus-cash-contextGet a context vector
tCorpus-cash-deduplicateDeduplicate documents
tCorpus-cash-delete_columnsDelete column from the data and meta data
tCorpus-cash-feats_to_columnsCast the "feats" column in UDpipe tokens to columns
tCorpus-cash-feature_subsetFilter features
tCorpus-cash-fold_rsyntaxFold rsyntax annotations
tCorpus-cash-getAccess the data from a tCorpus
tCorpus-cash-lda_fitEstimate a LDA topic model
tCorpus-cash-mergeMerge the token and meta data.tables of a tCorpus with...
tCorpus-cash-preprocessPreprocess feature
tCorpus-cash-replace_dictionaryReplace tokens with dictionary match
tCorpus-cash-search_recodeRecode features in a tCorpus based on a search string
tCorpus-cash-setModify the token and meta data.tables of a tCorpus
tCorpus-cash-set_levelsChange levels of factor columns
tCorpus-cash-set_nameChange column names of data and meta data
tCorpus-cash-subsetSubset a tCorpus
tCorpus-cash-subset_querySubset tCorpus token data using a query
tCorpus-cash-udpipe_clausesAdd columns indicating who did what
tCorpus-cash-udpipe_quotesAdd columns indicating who said what
tCorpus_compareCorpus comparison
tCorpus_createCreating a tCorpus
tCorpus_dataMethods and functions for viewing, modifying and subsetting...
tCorpus_docsimDocument similarity
tCorpus_featuresPreprocessing, subsetting and analyzing features
tCorpus_modify_by_referenceModify tCorpus by reference
tCorpus_queryingUse Boolean queries to analyze the tCorpus
tCorpus_semnetFeature co-occurrence based semantic network analysis
tCorpus_topmodTopic modeling
tc_plot_treeVisualize a dependency tree
tc_sotu_udpipeA tCorpus with a small sample of sotu paragraphs parsed with...
tokens_to_tcorpusCreate a tcorpus based on tokens (i.e. preprocessed texts)
tokenWindowOccurenceGives the window in which a term occured in a matrix.
top_featuresShow top features
transform_rsyntaxApply rsyntax transformations
udpipe_clause_tqueriesGet a list of tqueries for extracting who did what
udpipe_quote_tqueriesGet a list of tqueries for extracting quotes
udpipe_simplifySimplify tokenIndex created with the udpipe parser
udpipe_spanquote_tqueriesGet a list of tqueries for finding candidates for span...
udpipe_tcorpusCreate a tCorpus using udpipe
untokenizeReconstruct original texts
corpustools documentation built on May 31, 2023, 8:45 p.m.