tCorpus | R Documentation |
The tCorpus is a class for managing tokenized texts, stored as a data.frame in which each row represents a token, and columns contain the positions and features of these tokens.
The corpustools package uses both functions and methods for working with the tCorpus.
Methods are used for all operations that modify the tCorpus itself, such as subsetting or adding columns. This allows the data to be modified by reference. Methods are accessed using the dollar sign after the tCorpus object. For example, if the tCorpus is named tc, the subset method can be called as tc$subset(...)
Functions are used for all operations that return a certain output, such as search results or a semantic network. These are used in the common R style that you know and love. For example, if the tCorpus is named tc, a semantic network can be created with semnet(tc, ...)
The primary goal of the tCorpus is to facilitate various corpus analysis techniques. The documentation for currently implemented techniques can be reached through the following links.
Create a tCorpus | Functions for creating a tCorpus object |
Manage tCorpus data | Methods for viewing, modifying and subsetting tCorpus data |
Features | Preprocessing, subsetting and analyzing features |
Using search strings | Use Boolean queries to analyze the tCorpus |
Co-occurrence networks | Feature co-occurrence based semantic network analysis |
Corpus comparison | Compare corpora |
Topic modeling | Create and visualize topic models |
Document similarity | Calculate document similarity |
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.