as.speeches-method: Split partition into speeches

Description Usage Arguments Value Examples

Description

A method designed for corpora from the PolMine corpora of plenary protocols. A partition is split into speeches.

Usage

1
2
3
## S4 method for signature 'partition'
as.speeches(.Object, sAttributeDates, sAttributeNames,
  gap = 500, mc = FALSE, verbose = TRUE, progress = TRUE)

Arguments

.Object

a partition .Object

sAttributeDates

the s-attribute that provides the dates of sessions

sAttributeNames

the s-attribute that provides the names of speakers

gap

number of tokens between strucs to identify speeches

mc

whether to use multicore, defaults to FALSE

verbose

logical, defaults to TRUE

progress

logical

Value

a partitionBundle object

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
## Not run: 
  use(polmineR.sampleCorpus)
  bt <- partition("PLPRBTTXT", text_year="2009")
  speeches <- as.speeches(bt, sAttributeDates="text_date", sAttributeNames="text_name")
  
  # step-by-step, not the fastest way
  speeches <- enrich(speeches, pAttribute="word")
  tdm <- as.TermDocumentMatrix(speeches, col="count")
  
  # fast option (counts performed when assembling the sparse matrix)
  tdm <- as.TermDocumentMatrix(speeches, pAttribute="word")
  termsToDropList <- noise(tdm)
  whatToDrop <- c("stopwords", "specialChars", "numbers", "minNchar")
  termsToDrop <- unlist(lapply(whatToDrop, function(x) termsToDropList[[x]]))
  tdm <- trim(tdm, termsToDrop = termsToDrop)

## End(Not run)

nrauscher/corpus documentation built on May 23, 2019, 9:34 p.m.