as.speeches-method: Split partition into speeches
In nrauscher/corpus: Toolkit for Corpus Analysis

Description Usage Arguments Value Examples

A method designed for corpora from the PolMine corpora of plenary protocols. A partition is split into speeches.

1
2
3

## S4 method for signature 'partition'
as.speeches(.Object, sAttributeDates, sAttributeNames,
  gap = 500, mc = FALSE, verbose = TRUE, progress = TRUE)

`.Object`	a partition .Object
`sAttributeDates`	the s-attribute that provides the dates of sessions
`sAttributeNames`	the s-attribute that provides the names of speakers
`gap`	number of tokens between strucs to identify speeches
`mc`	whether to use multicore, defaults to FALSE
`verbose`	logical, defaults to TRUE
`progress`	logical

a partitionBundle object

## Not run: 
  use(polmineR.sampleCorpus)
  bt <- partition("PLPRBTTXT", text_year="2009")
  speeches <- as.speeches(bt, sAttributeDates="text_date", sAttributeNames="text_name")
  
  # step-by-step, not the fastest way
  speeches <- enrich(speeches, pAttribute="word")
  tdm <- as.TermDocumentMatrix(speeches, col="count")
  
  # fast option (counts performed when assembling the sparse matrix)
  tdm <- as.TermDocumentMatrix(speeches, pAttribute="word")
  termsToDropList <- noise(tdm)
  whatToDrop <- c("stopwords", "specialChars", "numbers", "minNchar")
  termsToDrop <- unlist(lapply(whatToDrop, function(x) termsToDropList[[x]]))
  tdm <- trim(tdm, termsToDrop = termsToDrop)

## End(Not run)