pmi | R Documentation |

Calculate Pointwise Mutual Information as an information-theoretic approach to find collocations.

```
pmi(.Object, ...)
## S4 method for signature 'context'
pmi(.Object)
## S4 method for signature 'Cooccurrences'
pmi(.Object)
## S4 method for signature 'ngrams'
pmi(.Object, observed, p_attribute = p_attributes(.Object)[1])
```

`.Object` |
An object. |

`...` |
Arguments methods may require. |

`observed` |
A |

`p_attribute` |
The positional attribute which shall be considered. Relevant only if ngrams have been calculated for more than one p-attribute. |

Pointwise mutual information (PMI) is calculated as follows (see Manning/Schuetze 1999):

`I(x,y) = log\frac{p(x,y)}{p(x)p(y)}`

The formula is based on maximum likelihood estimates: When we know the number
of observations for token x, `o_{x}`

, the number of observations
for token y, `o_{y}`

and the size of the corpus N, the
propabilities for the tokens x and y, and for the co-occcurence of x and y
are as follows:

`p(x) = \frac{o_{x}}{N}`

`p(y) = \frac{o_{y}}{N}`

The term p(x,y) is the number of observed co-occurrences of x and y.

Note that the computation uses log base 2, not the natural logarithm you find in examples (e.g. https://en.wikipedia.org/wiki/Pointwise_mutual_information).

Manning, Christopher D.; Schuetze, Hinrich (1999): *Foundations of Statistical Natural Language
Processing*. MIT Press: Cambridge, Mass., pp. 178-183.

Other statistical methods:
`chisquare()`

,
`ll()`

,
`t_test()`

```
y <- cooccurrences("REUTERS", query = "oil", method = "pmi")
N <- size(y)[["partition"]]
I <- log2((y[["count_coi"]]/N) / ((count(y) / N) * (y[["count_partition"]] / N)))
use("polmineR")
use(pkg = "RcppCWB", corpus = "REUTERS")
dt <- decode(
"REUTERS",
p_attribute = "word",
s_attribute = character(),
to = "data.table",
verbose = FALSE
)
n <- ngrams(dt, n = 2L, p_attribute = "word")
obs <- count("REUTERS", p_attribute = "word")
phrases <- pmi(n, observed = obs)
```

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.