ngrams: Get N-Grams

Description Usage Arguments Examples

Description

Count n-grams, either of words, or of characters.

Usage

1
2
3
4
5
6
7
8
9
ngrams(.Object, ...)

## S4 method for signature 'partition'
ngrams(.Object, n = 2, pAttribute = "word",
  char = NULL, progress = FALSE, ...)

## S4 method for signature 'partitionBundle'
ngrams(.Object, n = 2, char = NULL,
  pAttribute = "word", mc = FALSE, progress = FALSE, ...)

Arguments

.Object

object of class partition

...

further parameters

n

number of tokens/characters

pAttribute

the p-attribute to use (can be > 1)

char

if NULL, tokens will be counted, else characters, keeping only those provided by a character vector

progress

logical

mc

logical, whether to use multicore

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
## Not run: 
  use("polmineR.sampleCorpus")
  P <- partition("PLPRBTTXT", text_date = "2009-10-27")
  ngramObject <- ngrams(P, n = 2, pAttribute = "word", char = NULL)
  # a more complex scenario: get most frequent ADJA/NN-combinations
  ngramObject <- ngrams(P, n = 2, pAttribute = c("word", "pos"), char = NULL)
  ngramObject2 <- subset(
    ngramObject,
    ngramObject[["1_pos"]] == "ADJA"  & ngramObject[["2_pos"]] == "NN"
    )
  ngramObject2@stat[, "1_pos" := NULL, with = FALSE][, "2_pos" := NULL, with = FALSE]
  ngramObject3 <- sort(ngramObject2, by = "count")
  head(ngramObject3)
 
## End(Not run)

nrauscher/corpus documentation built on May 23, 2019, 9:34 p.m.