polmineR: Toolkit for Corpus Analysis

Library for corpus analysis using the Corpus Workbench as an efficient back end for indexing and querying large corpora. The package offers functionality to flexibly create partitions and to carry out basic statistical operations (count, co-occurrences etc.). The original full text of documents can be reconstructed and inspected at any time. Beyond that, the package is intended to serve as an interface to packages implementing advanced statistical procedures. Respective data structures (document term matrices, term co-occurrence matrices etc.) can be created based on the indexed corpora.

Install the latest version of this package by entering the following in R:
AuthorAndreas Blaette
Date of publication2017-03-24 06:25:38 UTC
MaintainerAndreas Blaette <andreas.blaette@uni-due.de>

View on CRAN

Man pages

as.DocumentTermMatrix: as.TermDocumentMatrix / as.DocumentTermMatrix

as.markdown: Generate markdown from a partition.

as.sparseMatrix: Type conversion - get sparseMatrix.

as.speeches-method: Split partition into speeches

blapply: apply a function over a list or bundle

browse: display in browser

bundle-class: bundle class

chisquare-method: perform chisquare-text

contextBundle-class: S4 contextBundle class

context-class: S4 context class

context-method: Analyze context of a node word.

cooccurrences: Get cooccurrence statistics.

cooccurrencesBundle-class: S4 cooccurrencesBundle class

cooccurrences-class: cooccurrences

cooccurrencesReshaped: Methods for manipulating cooccurrencesReshaped-class-objects

Corpus-class: Corpus class.

corpus-method: Get corpus.

count-method: Get counts.

cpos-method: Get corpus positions for (CQP) queries.

CQI: Interfaces for accessing the CWB

cqp: CQP queries

cqpserver: start CQP server

decode: Decode corpus.

dispersion-class: dispersion class

dispersion-method: Dispersion of a query or multiple queries

divide: divide an object into equally sized parts

dotplot-method: dotplot

encode-method: Encode CWB Corpus.

encoding: get/set encoding slot of an object

encodings: adjust encoding

enrich-method: enrich an object

features-class: Feature selection by comparison (S4 class).

features-method: Get features by comparison.

flatten: flatten a nested list

frequencies: Frequency breakdown of the variation of query results

getEncoding-method: get the encoding of a corpus

getObjects: Get objects of a certain class.

getSlot: Get slot from object.

getTerms-method: get terms available in a corpus or partition

getTokenStream-method: Get Token Stream Based on Corpus Positions.

hits: Get Hits.

html-method: restore fulltext as html

install.corpus: Install packaged corpus from repository.

kwic: KWIC output / concordances

kwic-class: kwic (S4 class)

mail-method: Mail result.

means-method: calculate means

meta-method: metainformation

ngrams: Get N-Grams

noise: detect noise

partition: Initialize a partition.

partitionBundle-class: partitionBundle class

partitionBundle-method: Generate a bundle of partitions

partition-class: partition class

pAttribute-method: get pAttribute

pAttributes: Get p-attributes.

polmineR-generics: generic methods defined in the polmineR-package

polmineR-package: polmineR-package

read-method: Display and read full text

Regions-class: Regions of a CWB corpus.

registry: Reload using new CORPUS_REGISTRY.

RegistryFile-class: Read, parse and modify registry file.

sAttributes-method: Get s-attributes.

scatterplot-method: word scatterplot

size-method: Get number of tokens.

split-partition-method: split partition into partitionBundle

tempcorpus: S4 class to capture core information on a temporary CWB...

templates: Get and set templates.

TermDocumentMatrix: Methods for TermDocumentMatrix / DocumentTermMatrix

terms-partition-method: get terms available in a corpus

textstat-class: S4 textstat class

textstatistics: text statistics

TokenStream-class: Class for token stream operations.

trim-method: trim an object

tTest: perform t-test

use: Use packaged corpus.

view: browse an object using View()

weigh-method: weigh a matrix


