tCorpus: tCorpus: a corpus class for tokenized texts

tCorpusR Documentation

tCorpus: a corpus class for tokenized texts

Description

The tCorpus is a class for managing tokenized texts, stored as a data.frame in which each row represents a token, and columns contain the positions and features of these tokens.

Methods and Functions

The corpustools package uses both functions and methods for working with the tCorpus.

Methods are used for all operations that modify the tCorpus itself, such as subsetting or adding columns. This allows the data to be modified by reference. Methods are accessed using the dollar sign after the tCorpus object. For example, if the tCorpus is named tc, the subset method can be called as tc$subset(...)

Functions are used for all operations that return a certain output, such as search results or a semantic network. These are used in the common R style that you know and love. For example, if the tCorpus is named tc, a semantic network can be created with semnet(tc, ...)

Overview of methods and functions

The primary goal of the tCorpus is to facilitate various corpus analysis techniques. The documentation for currently implemented techniques can be reached through the following links.

Create a tCorpus Functions for creating a tCorpus object
Manage tCorpus data Methods for viewing, modifying and subsetting tCorpus data
Features Preprocessing, subsetting and analyzing features
Using search strings Use Boolean queries to analyze the tCorpus
Co-occurrence networks Feature co-occurrence based semantic network analysis
Corpus comparison Compare corpora
Topic modeling Create and visualize topic models
Document similarity Calculate document similarity

kasperwelbers/tcorpus documentation built on May 10, 2023, 5:10 p.m.