BNCdomains: Distribution of domains in the British National Corpus (BNC)

Description Usage Format Details Author(s) References

Description

This data set gives the number of documents and tokens in each of the 18 domains represented in the British National Corpus, World Edition (BNC). See Aston & Burnard (1998) for more information about the BNC and the domain classification, or go to http://www.natcorp.ox.ac.uk/.

Usage

1

Format

A data frame with 19 rows and the following columns:

domain:

name of the respective domain in the BNC

documents:

number of documents from this domain

tokens:

total number of tokens in all documents from this domain

Details

For one document in the BNC, the domain classification is missing. This document is represented by the code Unlabeled in the data set.

Author(s)

Marco Baroni <baroni@sslmit.unibo.it>

References

Aston, Guy and Burnard, Lou (1998). The BNC Handbook. Edinburgh University Press, Edinburgh. See also the BNC homepage at http://www.natcorp.ox.ac.uk/.


SIGIL documentation built on May 2, 2019, 6:20 p.m.