SurfaceColloc: A small data set of surface collocations from the English...

SurfaceCollocR Documentation

A small data set of surface collocations from the English Wikipedia

Description

This data set demonstrates how co-occurrence and marginal frequencies can be provided for collocation analysis with am.score. It contains surface co-occurrence counts for 7 English nouns as nodes and 7 selected collocates. The counts are based on a collocational span of two tokens to the left and right of the node (L2/R2) in the WP500 corpus. Marginal frequencies for the nodes are overall corpus frequencies of the nouns, so expected co-occurrence frequency needs to be adjusted with the total span size of 4 tokens.

Usage


SurfaceColloc

Format

A list with the following components:

cooc:

A data frame with 34 rows and the following columns:

  • w1: node word (noun)

  • w2: collocate

  • f: co-occurrence frequency within L2/R2 span

f1:

Labelled integer vector of length 7 specifying the marginal frequencies of the node nouns.

f2:

Labelled integer vector of length 7 specifying the marginal frequencies of the collocates.

N:

Sample size, i.e. the total number of tokens in the WP500 corpus.

Author(s)

Stephanie Evert (https://purl.org/stephanie.evert)

See Also

am.score

Examples

head(SurfaceColloc$cooc, 10)
SurfaceColloc$f1
SurfaceColloc$f2
SurfaceColloc$N

corpora documentation built on June 10, 2025, 3:01 a.m.