polmineR: Toolkit for Corpus Analysis

Library for corpus analysis using the Corpus Workbench as an efficient back end for indexing and querying large corpora. The package offers functionality to flexibly create partitions and to carry out basic statistical operations (count, co-occurrences etc.). The original full text of documents can be reconstructed and inspected at any time. Beyond that, the package is intended to serve as an interface to packages implementing advanced statistical procedures. Respective data structures (document term matrices, term co-occurrence matrices etc.) can be created based on the indexed corpora.

Install the latest version of this package by entering the following in R:
install.packages("polmineR")
AuthorAndreas Blaette
Date of publication2017-03-24 06:25:38 UTC
MaintainerAndreas Blaette <andreas.blaette@uni-due.de>
LicenseGPL-3
Version0.7.2
https://www.github.com/PolMine/polmineR

View on CRAN

Man pages

as.DocumentTermMatrix: as.TermDocumentMatrix / as.DocumentTermMatrix

as.markdown: Generate markdown from a partition.

as.sparseMatrix: Type conversion - get sparseMatrix.

as.speeches-method: Split partition into speeches

blapply: apply a function over a list or bundle

browse: display in browser

bundle-class: bundle class

chisquare-method: perform chisquare-text

contextBundle-class: S4 contextBundle class

context-class: S4 context class

context-method: Analyze context of a node word.

cooccurrences: Get cooccurrence statistics.

cooccurrencesBundle-class: S4 cooccurrencesBundle class

cooccurrences-class: cooccurrences

cooccurrencesReshaped: Methods for manipulating cooccurrencesReshaped-class-objects

Corpus-class: Corpus class.

corpus-method: Get corpus.

count-method: Get counts.

cpos-method: Get corpus positions for (CQP) queries.

CQI: Interfaces for accessing the CWB

cqp: CQP queries

cqpserver: start CQP server

decode: Decode corpus.

dispersion-class: dispersion class

dispersion-method: Dispersion of a query or multiple queries

divide: divide an object into equally sized parts

dotplot-method: dotplot

encode-method: Encode CWB Corpus.

encoding: get/set encoding slot of an object

encodings: adjust encoding

enrich-method: enrich an object

features-class: Feature selection by comparison (S4 class).

features-method: Get features by comparison.

flatten: flatten a nested list

frequencies: Frequency breakdown of the variation of query results

getEncoding-method: get the encoding of a corpus

getObjects: Get objects of a certain class.

getSlot: Get slot from object.

getTerms-method: get terms available in a corpus or partition

getTokenStream-method: Get Token Stream Based on Corpus Positions.

hits: Get Hits.

html-method: restore fulltext as html

install.corpus: Install packaged corpus from repository.

kwic: KWIC output / concordances

kwic-class: kwic (S4 class)

mail-method: Mail result.

means-method: calculate means

meta-method: metainformation

ngrams: Get N-Grams

noise: detect noise

partition: Initialize a partition.

partitionBundle-class: partitionBundle class

partitionBundle-method: Generate a bundle of partitions

partition-class: partition class

pAttribute-method: get pAttribute

pAttributes: Get p-attributes.

polmineR-generics: generic methods defined in the polmineR-package

polmineR-package: polmineR-package

read-method: Display and read full text

Regions-class: Regions of a CWB corpus.

registry: Reload using new CORPUS_REGISTRY.

RegistryFile-class: Read, parse and modify registry file.

sAttributes-method: Get s-attributes.

scatterplot-method: word scatterplot

size-method: Get number of tokens.

split-partition-method: split partition into partitionBundle

tempcorpus: S4 class to capture core information on a temporary CWB...

templates: Get and set templates.

TermDocumentMatrix: Methods for TermDocumentMatrix / DocumentTermMatrix

terms-partition-method: get terms available in a corpus

textstat-class: S4 textstat class

textstatistics: text statistics

TokenStream-class: Class for token stream operations.

trim-method: trim an object

tTest: perform t-test

use: Use packaged corpus.

view: browse an object using View()

weigh-method: weigh a matrix

Functions

adjustEncoding Man page
adjustEncoding,character-method Man page
aggregate,partition-method Man page
as.bundle Man page
as.bundle,list-method Man page
as.bundle,textstat-method Man page
as.cqp Man page
as.data.frame,kwic-method Man page
as.data.frame,partition-method Man page
as.data.frame,textstat-method Man page
as.DataTables,context-method Man page
as.DataTables,textstat-method Man page
as.data.table,textstat-method Man page
as.DocumentTermMatrix Man page
as.DocumentTermMatrix,bundle-method Man page
as.DocumentTermMatrix,character-method Man page
as.DocumentTermMatrix,partitionBundle-method Man page
as.igraph,cooccurrences-method Man page
as.markdown Man page
as.markdown,numeric-method Man page
as.markdown,partition-method Man page
as.markdown,plprPartition-method Man page
as.matrix,bundle-method Man page
as.matrix,contextBundle-method Man page
as.matrix,partitionBundle-method Man page
as.partitionBundle Man page
as.partitionBundle,list-method Man page
as.partitionBundle,partition-method Man page
as.sparseMatrix Man page
as.sparseMatrix,bundle-method Man page
as.sparseMatrix,cooccurrences-method Man page
as.sparseMatrix,simple_triplet_matrix-method Man page
as.sparseMatrix,TermDocumentMatrix-method Man page
as.speeches Man page
as.speeches,partition-method Man page
as.TermContextBundle,contextBundle-method Man page
as.TermDocumentMatrix Man page
as.TermDocumentMatrix,bundle-method Man page
as.TermDocumentMatrix,character-method Man page
as.TermDocumentMatrix,partitionBundle-method Man page
as.utf8 Man page
as.utf8,character-method Man page
barplot,partitionBundle-method Man page
blapply Man page
blapply,bundle-method Man page
blapply,list-method Man page
blapply,vector-method Man page
browse Man page
browse<- Man page
browse,context-method Man page
browse,html-method Man page
browse,kwic-method Man page
browse,partition-method Man page
browse,pressPartition-method Man page
browse,textstat-method Man page
+,bundle,bundle-method Man page
bundle-class Man page
[[,bundle-method Man page
+,bundle,textstat-method Man page
chisquare Man page
chisquare,context-method Man page
chisquare,textstat-method Man page
colnames,textstat-method Man page
context Man page
[,context,ANY,ANY,ANY-method Man page
[,contextBundle,ANY,ANY,ANY-method Man page
contextBundle-class Man page
[,contextBundle-method Man page
[[,contextBundle-method Man page
context,character-method Man page
context-class Man page
context,contextBundle-method Man page
context,cooccurrences-method Man page
[,context-method Man page
[[,context-method Man page
context,partitionBundle-method Man page
context,partition-method Man page
cooccurrences Man page
[,cooccurrences,ANY,ANY,ANY-method Man page
cooccurrencesBundle Man page
cooccurrencesBundle-class Man page
cooccurrences,character-method Man page
cooccurrences-class Man page
cooccurrences,context-method Man page
cooccurrences,Corpus-method Man page
[,cooccurrences-method Man page
cooccurrences,partition-method Man page
cooccurrencesReshaped Man page
cooccurrencesReshaped-class Man page
corpus Man page
Corpus Man page
corpus,bundle-method Man page
corpus,missing-method Man page
corpus,partition-method Man page
count Man page
count,character-method Man page
count,context-method Man page
count,dispersion-method Man page
count-method Man page
count,partitionBundle-method Man page
count,partition-method Man page
count,vector-method Man page
cpos Man page
cpos,character-method Man page
cpos,matrix-method Man page
cpos,partition-method Man page
cpos,tempcorpus-method Man page
CQI Man page
CQI.cqpserver Man page
CQI.perl Man page
CQI.Rcpp Man page
CQI.rcqp Man page
CQI.super Man page
cqpserver Man page
decode Man page
decode,character-method Man page
decode,Corpus-method Man page
dim,textstat-method Man page
dispersion Man page
dispersion,character-method Man page
dispersion-class Man page
dispersion,hits-method Man page
dispersion,partition-method Man page
dissect Man page
dissect,partition-method Man page
divide Man page
divide,matrix-method Man page
divide,vector-method Man page
dotplot Man page
dotplot,features-method Man page
dotplot,partition-method Man page
dotplot,textstat-method Man page
encode Man page
encode,data.frame-method Man page
encode,data.table-method Man page
encoding Man page
encoding<- Man page
encoding,partitionBundle-method Man page
encoding,partition-method Man page
enrich Man page
enrich,kwic-method Man page
enrich-method Man page
enrich,partitionBundle-method Man page
enrich,partition-method Man page
export Man page
export,partition-method Man page
features Man page
featuresBundle-class Man page
features-class Man page
featuresCooccurrences-class Man page
featuresNgrams-class Man page
features,ngrams-method Man page
features,partitionBundle-method Man page
features,partition-method Man page
flatten Man page
freq Man page
freq,dispersion-method Man page
freq,partition-method Man page
frequencies Man page
frequencies-method Man page
frequencies,partitionBundle-method Man page
frequencies,partition-method Man page
getEncoding Man page
getEncoding,character-method Man page
getObjects Man page
getSlot Man page
getTemplate Man page
getTemplate,character-method Man page
getTemplate,missing-method Man page
getTemplate,partition-method Man page
getTerms Man page
getTerms,character-method Man page
getTokenStream Man page
getTokenStream,character-method Man page
getTokenStream,matrix-method Man page
getTokenStream,numeric-method Man page
getTokenStream,partition-method Man page
getTokenStream,Regions-method Man page
head,context-method Man page
head,textstat-method Man page
hist,partition-method Man page
hits Man page
hits,character-method Man page
hits-class Man page
hits,partitionBundle-method Man page
hits,partition-method Man page
html Man page
html,character-method Man page
html,kwic-method Man page
html,partitionBundle-method Man page
html,partition-method Man page
install.corpus Man page
is.cqp Man page
kwic Man page
[,kwic,ANY,ANY,ANY-method Man page
kwic,character-method Man page
kwic-class Man page
kwic,context-method Man page
[,kwic-method Man page
kwic,partition-method Man page
length,bundle-method Man page
length,partition-method Man page
ll Man page
ll,context-method Man page
ll,cooccurrences-method Man page
ll,features-method Man page
mail Man page
mail,context-method Man page
mail,data.frame-method Man page
mail,dispersion-method Man page
mail,features-method Man page
mail,kwic-method Man page
mail-method Man page
mail,partition-method Man page
means Man page
means,DocumentTermMatrix-method Man page
merge,cooccurrencesReshaped-method Man page
merge,partitionBundle-method Man page
meta Man page
meta,character-method Man page
meta,partition-method Man page
name Man page
name<- Man page
name<-,partition,character-method Man page
name,partition-method Man page
names<-,bundle,character-method Man page
names,bundle-method Man page
names,partitionBundle-method Man page
names,textstat-method Man page
ngrams Man page
ngrams-class Man page
ngrams,partitionBundle-method Man page
ngrams,partition-method Man page
noise Man page
noise,character-method Man page
noise,DocumentTermMatrix-method Man page
noise,TermDocumentMatrix-method Man page
noise,textstat-method Man page
nrow,textstat-method Man page
partition Man page
[,partition,ANY,ANY,ANY-method Man page
partitionBundle Man page
[,partitionBundle,ANY,ANY,ANY-method Man page
+,partitionBundle,ANY-method Man page
partitionBundle,character-method Man page
partitionBundle-class Man page
partitionBundle,context-method Man page
partitionBundle,environment-method Man page
[,partitionBundle-method Man page
[[,partitionBundle-method Man page
+,partitionBundle-method Man page
+,partitionBundle,partitionBundle-method Man page
+,partitionBundle,partition-method Man page
partitionBundle,partition-method Man page
partition,character-method Man page
partition-class Man page
partition,environment-method Man page
partition,list-method Man page
[,partition-method Man page
[[,partition-method Man page
partition,partition-method Man page
pAttribute Man page
pAttributes Man page
pAttributes,character-method Man page
pAttributes,partition-method Man page
pAttribute,textstat-method Man page
plprPartition-class Man page
pmi Man page
pmi,context-method Man page
polmineR-generics Man page
polmineR-package Man page
pressPartition-class Man page
punctuation Man page
read Man page
read,data.table-method Man page
read,hits-method Man page
read,kwic-method Man page
read,partitionBundle-method Man page
read,partition-method Man page
read,Regions-method Man page
Regions-class Man page
RegistryFile Man page
RegistryFile-class Man page
resetRegistry Man page
round,textstat-method Man page
rownames,textstat-method Man page
sample,bundle-method Man page
sample,hits-method Man page
sAttributes Man page
sAttributes,character-method Man page
sAttributes,partitionBundle-method Man page
sAttributes,partition-method Man page
scatterplot Man page
scatterplot,data.frame-method Man page
scatterplot-method Man page
setTemplate Man page
setTemplate,character-method Man page
setTemplate,missing-method Man page
show,contextBundle-method Man page
show,context-method Man page
show,cooccurrences-method Man page
show,dispersion-method Man page
show,features-method Man page
show,html-method Man page
show,kwic-method Man page
show,partitionBundle-method Man page
show,partition-method Man page
show,textstat-method Man page
size Man page
size,character-method Man page
size,DocumentTermMatrix-method Man page
size,partition-method Man page
size,TermDocumentMatrix-method Man page
sort,textstat-method Man page
split Man page
split,partition Man page
split,partition-method Man page
startServer Man page
subset,textstat-method Man page
summary,contextBundle-method Man page
summary,context-method Man page
summary,cooccurrences-method Man page
summary,featuresBundle-method Man page
summary,features-method Man page
summary,partitionBundle-method Man page
tail,textstat-method Man page
t,dispersion-method Man page
tempcorpus Man page
tempcorpus-class Man page
TermDocumentMatrix Man page
terms-partition-method Man page
terms,partition-method Man page
[,textstat,ANY,ANY,ANY-method Man page
textstat-class Man page
[[,textstat-method Man page
+,textstat,textstat-method Man page
TokenStream-class Man page
trim Man page
trim,cooccurrences-method Man page
trim,dispersion-method Man page
trim,DocumentTermMatrix-method Man page
trim-method Man page
trim,TermDocumentMatrix-method Man page
tTest Man page
tTest,context-method Man page
unique,bundle-method Man page
use Man page
view Man page
view,context-method Man page
view,cooccurrences-method Man page
view,cooccurrencesReshaped-method Man page
view,features-method Man page
view,kwic-method Man page
view,partition-method Man page
view,textstat-method Man page
weigh Man page
weigh,DocumentTermMatrix-method Man page
weigh,TermDocumentMatrix-method Man page

Files

inst
inst/Rscript
inst/Rscript/getTopics.R
inst/CITATION
inst/perl
inst/perl/corpus_size.pl
inst/graffle
inst/graffle/polmineRclassDiagram.graffle
inst/css
inst/css/markdown7.css
inst/css/tooltips.css
inst/cgi
inst/cgi/kwic2fulltext.R
inst/doc
inst/doc/vignette.Rmd
inst/doc/vignette.R
inst/doc/vignette.html
inst/init
inst/init/cqpserver.init
NAMESPACE
NEWS
R
R/as.speeches_method.R R/noise_method.R R/scatterplot_method.R R/Textstat.R R/use_method.R R/pAttributes_method.R R/partition_methods.R R/utils.R R/weigh_method.R R/freq_method.R R/read_method.R R/templates_method.R R/registry.R R/decode_method.R R/kwic_method.R R/ngrams_method.R R/contextBundle_class.R R/Corpus_class.R R/CQI.cqpserver.R R/trim_method.R R/html_method.R R/rustyard.R R/partitionBundle_methods.R R/highlight_method.R R/tempcorpus.R R/as.DocumentTermMatrix_method.R R/mail_method.R R/pmi_method.R R/features_method.R R/cooccurrences_class.R R/kwic_class.R R/blapply_method.R R/enrich_method.R R/dispersion_method.R R/size_method.R R/generics.R R/partition_method.R R/textstat_class.R R/chisquare_method.R R/cooccurrences_method.R R/corpus_method.R R/Regions_class.R R/Partition.R R/ll_method.R R/browse_method.R R/getTokenStream_method.R R/context_method.R R/sAttributes_method.R R/CQI.R R/divide_method.R R/cpos_method.R R/getTerms_method.R R/as.markdown_method.R R/hits_class.R R/sAttributes2cpos_method.R R/plot_method.R R/install.corpus_function.R R/meta_method.R R/getEncoding_method.R R/encode_method.R R/context_class.R R/CQI.rcqp.R R/count_method.R R/TokenStream_class.R R/partition_class.R R/dispersion_class.R R/polmineR_package.R R/partitionBundle_method.R R/as.sparseMatrix_method.R R/features_class.R R/CQI.perl.R R/bundle_class.R R/RegistryFile.R R/html_methods.R R/partitionBundle_class.R R/terms_method.R R/frequencies_method.R R/TermDocumentMatrix_methods.R R/view_method.R R/tTest.R R/CQI.Rcpp.R R/means_method.R R/aggregate_method.R R/dotplot_method.R R/adjustEncoding_method.R R/zzz.R R/encoding_method.R
vignettes
vignettes/vignette.Rmd
README.md
MD5
build
build/vignette.rds
DESCRIPTION
configure
man
man/TermDocumentMatrix.Rd man/cqpserver.Rd man/textstatistics.Rd man/read-method.Rd man/cooccurrencesReshaped.Rd man/contextBundle-class.Rd man/as.speeches-method.Rd man/partitionBundle-class.Rd man/as.DocumentTermMatrix.Rd man/dotplot-method.Rd man/textstat-class.Rd man/context-class.Rd man/chisquare-method.Rd man/size-method.Rd man/divide.Rd man/partition-class.Rd man/getTokenStream-method.Rd man/count-method.Rd man/Regions-class.Rd man/frequencies.Rd man/CQI.Rd man/polmineR-generics.Rd man/cooccurrences-class.Rd man/TokenStream-class.Rd man/install.corpus.Rd man/terms-partition-method.Rd man/split-partition-method.Rd man/decode.Rd man/as.sparseMatrix.Rd man/polmineR-package.Rd man/kwic.Rd man/flatten.Rd man/meta-method.Rd man/getObjects.Rd man/encoding.Rd man/Corpus-class.Rd man/encodings.Rd man/as.markdown.Rd man/features-class.Rd man/bundle-class.Rd man/use.Rd man/cqp.Rd man/templates.Rd man/getTerms-method.Rd man/cooccurrences.Rd man/ngrams.Rd man/registry.Rd man/trim-method.Rd man/scatterplot-method.Rd man/dispersion-class.Rd man/pAttribute-method.Rd man/context-method.Rd man/RegistryFile-class.Rd man/pAttributes.Rd man/cpos-method.Rd man/getSlot.Rd man/sAttributes-method.Rd man/html-method.Rd man/partition.Rd man/getEncoding-method.Rd man/weigh-method.Rd man/encode-method.Rd man/enrich-method.Rd man/means-method.Rd man/partitionBundle-method.Rd man/noise.Rd man/kwic-class.Rd man/dispersion-method.Rd man/blapply.Rd man/tTest.Rd man/features-method.Rd man/browse.Rd man/hits.Rd man/tempcorpus.Rd man/corpus-method.Rd man/cooccurrencesBundle-class.Rd man/mail-method.Rd man/view.Rd

Questions? Problems? Suggestions? or email at ian@mutexlabs.com.

Please suggest features or report bugs with the GitHub issue tracker.

All documentation is copyright its authors; we didn't write any of that.