Man pages for udpipe
Tokenization, Parts of Speech Tagging, Lemmatization and Dependency Parsing with the 'UDPipe' 'NLP' Toolkit

as_conlluConvert a data.frame to CONLL-U format
as_cooccurrenceConvert a matrix to a co-occurrence data.frame
as.data.frame.udpipe_connluConvert the result of udpipe_annotate to a tidy data frame
as_fasttextCombine labels and text as used in fasttext
as.matrix.cooccurrenceConvert the result of cooccurrence to a sparse matrix
as_phrasemachineConvert Parts of Speech tags to one-letter tags which can be...
as_word2vecConvert a matrix of word vectors to word2vec format
brussels_listingsBrussels AirBnB address locations available at...
brussels_reviewsReviews of AirBnB customers on Brussels address locations...
brussels_reviews_annoReviews of the AirBnB customers which are tokenised, POS...
brussels_reviews_w2v_embeddings_lemma_nlAn example matrix of word embeddings
cbind_dependenciesAdd the dependency parsing information to an annotated...
cbind_morphologicalAdd morphological features to an annotated dataset
cooccurrenceCreate a cooccurence data.frame
document_term_frequenciesAggregate a data.frame to the document/term level by...
document_term_frequencies_statisticsAdd Term Frequency, Inverse Document Frequency and Okapi BM25...
document_term_matrixCreate a document/term matrix
dtm_alignReorder a Document-Term-Matrix alongside a vector or...
dtm_bindCombine 2 document term matrices either by rows or by columns
dtm_chisqCompare term usage across 2 document groups using the...
dtm_colsumsColumn sums and Row sums for document term matrices
dtm_conformMake sure a document term matrix has exactly the specified...
dtm_corPearson Correlation for Sparse Matrices
dtm_remove_lowfreqRemove terms occurring with low frequency from a...
dtm_remove_sparsetermsRemove terms with high sparsity from a Document-Term-Matrix
dtm_remove_termsRemove terms from a Document-Term-Matrix and keep only...
dtm_remove_tfidfRemove terms from a Document-Term-Matrix and documents with...
dtm_reverseInverse operation of the document_term_matrix function
dtm_sampleRandom samples and permutations from a Document-Term-Matrix
dtm_svd_similaritySemantic Similarity to a Singular Value Decomposition
dtm_tfidfTerm Frequency - Inverse Document Frequency calculation
keywords_collocationExtract collocations - a sequence of terms which follow each...
keywords_phrasesExtract phrases - a sequence of terms which follow each other...
keywords_rakeKeyword identification using Rapid Automatic Keyword...
paste.data.frameConcatenate text of each group of data together
predict.LDAPredict method for an object of class LDA_VEM or class...
strsplit.data.frameObtain a tokenised data frame by splitting text alongside a...
syntaxpatternsExperimental and undocumented querying of syntax patterns
syntaxrelationExperimental and undocumented querying of syntax...
txt_collapseCollapse a character vector while removing missing data.
txt_containsCheck if text contains a certain pattern
txt_contextBased on a vector with a word sequence, get n-grams (looking...
txt_countCount the number of times a pattern is occurring in text
txt_freqFrequency statistics of elements in a vector
txt_greplLook up a multiple patterns and indicate their presence in...
txt_highlightHighlight words in a character vector
txt_nextGet the n-th next element of a vector
txt_nextgramBased on a vector with a word sequence, get n-grams (looking...
txt_overlapGet the overlap between 2 vectors
txt_pasteConcatenate strings with options how to handle missing data
txt_previousGet the n-th previous element of a vector
txt_previousgramBased on a vector with a word sequence, get n-grams (looking...
txt_recodeRecode text to other categories
txt_recode_ngramRecode words with compound multi-word expressions
txt_sampleBoilerplate function to sample one element from a vector.
txt_sentimentPerform dictionary-based sentiment analysis on a tokenised...
txt_showBoilerplate function to cat only 1 element of a character...
txt_tagsequenceIdentify a contiguous sequence of tags as 1 being entity
udpipeTokenising, Lemmatising, Tagging and Dependency Parsing of...
udpipe_accuracyEvaluate the accuracy of your UDPipe model on holdout data
udpipe_annotateTokenising, Lemmatising, Tagging and Dependency Parsing...
udpipe_annotation_paramsList with training options set by the UDPipe community when...
udpipe_download_modelDownload an UDPipe model provided by the UDPipe community for...
udpipe_load_modelLoad an UDPipe model
udpipe_read_conlluRead in a CONLL-U file as a data.frame
udpipe_trainTrain a UDPipe model
unique_identifierCreate a unique identifier for each combination of fields in...
unlist_tokensCreate a data.frame from a list of tokens
udpipe documentation built on Jan. 6, 2023, 5:06 p.m.