BNCmeta: Metadata for the British National Corpus (XML edition)
In corpora: Statistics and Data Sets for Corpus Frequency Data

BNCmeta

R Documentation

Metadata for the British National Corpus (XML edition)

Description

This data set provides complete metadata for all 4048 texts of the British National Corpus (XML edition). See Aston & Burnard (1998) for more information about the BNC, or go to http://www.natcorp.ox.ac.uk/.

The data have automatically been extracted from the original BNC source files. Some transformations were applied so that all attribute names and their values are given in a human-readable form. The Perl scripts used in the extraction procedure are available from https://cwb.sourceforge.io/install.php#other.

Usage


BNCmeta

Format

A data frame with 4048 rows and the columns listed below. Unless specified otherwise, columns are coded as factors.

id:: BNC document ID; character vector
title:: Title of the document; character vector
n_words:: Number of words in the document; integer vector
n_tokens:: Total number of tokens (including punctuation and deleted material); integer vector
n_w:: Number of w-units (words); integer vector
n_c:: Number of c-units (punctuation); integer vector
n_s:: Number of s-units (sentences); integer vector
publication_date:: Publication date
text_type:: Text type
context:: Spoken context
respondent_age:: Age-group of respondent
respondent_class:: Social class of respondent (NRS social grades)
respondent_sex:: Sex of respondent
interaction_type:: Interaction type
region:: Region
author_age:: Author age-group
author_domicile:: Domicile of author
author_sex:: Sex of author
author_type:: Author type
audience_age:: Audience age
domain:: Written domain
difficulty:: Written difficulty
medium:: Written medium
publication_place:: Publication place
sampling_type:: Sampling type
circulation:: Estimated circulation size
audience_sex:: Audience sex
availability:: Availability
mode:: Text mode (written/spoken)
derived_type:: Text class
genre:: David Lee's genre classification

Author(s)

Stephanie Evert (https://purl.org/stephanie.evert)

References

Aston, Guy and Burnard, Lou (1998). The BNC Handbook. Edinburgh University Press, Edinburgh. See also the BNC homepage at http://www.natcorp.ox.ac.uk/.

corpora documentation built on June 10, 2025, 3:01 a.m.