| BNCmeta | R Documentation |
This data set provides complete metadata for all 4048 texts of the British National Corpus (XML edition). See Aston & Burnard (1998) for more information about the BNC, or go to http://www.natcorp.ox.ac.uk/.
The data have automatically been extracted from the original BNC source files. Some transformations were applied so that all attribute names and their values are given in a human-readable form. The Perl scripts used in the extraction procedure are available from https://cwb.sourceforge.io/install.php#other.
BNCmeta
A data frame with 4048 rows and the columns listed below. Unless specified otherwise, columns are coded as factors.
id:BNC document ID; character vector
title:Title of the document; character vector
n_words:Number of words in the document; integer vector
n_tokens:Total number of tokens (including punctuation and deleted material); integer vector
n_w:Number of w-units (words); integer vector
n_c:Number of c-units (punctuation); integer vector
n_s:Number of s-units (sentences); integer vector
publication_date:Publication date
text_type:Text type
context:Spoken context
respondent_age:Age-group of respondent
respondent_class:Social class of respondent (NRS social grades)
respondent_sex:Sex of respondent
interaction_type:Interaction type
region:Region
author_age:Author age-group
author_domicile:Domicile of author
author_sex:Sex of author
author_type:Author type
audience_age:Audience age
domain:Written domain
difficulty:Written difficulty
medium:Written medium
publication_place:Publication place
sampling_type:Sampling type
circulation:Estimated circulation size
audience_sex:Audience sex
availability:Availability
mode:Text mode (written/spoken)
derived_type:Text class
genre:David Lee's genre classification
Stephanie Evert (https://purl.org/stephanie.evert)
Aston, Guy and Burnard, Lou (1998). The BNC Handbook. Edinburgh University Press, Edinburgh. See also the BNC homepage at http://www.natcorp.ox.ac.uk/.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.