This data set specifies the number of passive and active verb phrases for each text in the extended Brown Family of corpora (Brown, LOB, Frown, FLOB, BLOB), covering edited written American and British English from 1930s, 1960s and 1990s (see Xiao 2008, 395–397).
Verb phrase and passive/active aspect counts are based on a fully automatic analysis of the texts, using the Pro3Gres parser (Schneider et al. 2004).
A data frame with 2499 rows and the following 11 columns:
A unique ID for each text (also used as row name)
Corpus, a factor with five levels
Genre, a factor with fifteen levels
R (Brown section codes)
Genre labels, a factor with fifteen levels (e.g.
Date of publication, a factor with three levels (
Language variety / region, a factor with levels
AmE (U.S.) and
Number of word tokens, an integer vector
Number of active verb phrases, an integer vector
Number of passive verb phrases, an integer vector
Total number of verb phrases, an integer vector
Percentage of passive verb phrases in the text, a numeric vector
No frequency data could be obtained for text
N02 in the Frown corpus. This entry has been omitted from the table.
Frequency information for this data set was kindly provided by Gerold Schneider, University of Zurich (http://www.cl.uzh.ch/de/people/team/compling/gschneid.html).
Stefan Evert <email@example.com>
Schneider, Gerold; Rinaldi, Fabio; Dowdall, James (2004). Fast, deep-linguistic statistical dependency parsing. In G.-J. M. Kruijff and D. Duchier (eds.), Proceedings of the COLING 2004 Workshop on Recent Advances in Dependency Grammar, pages 33-40, Geneva, Switzerland. https://files.ifi.uzh.ch/cl/gschneid/parser/
Xiao, Richard (2008). Well-known and influential corpora. In A. Lüdeling and M. Kytö (eds.), Corpus Linguistics. An International Handbook, chapter 20, pages 383–457. Mouton de Gruyter, Berlin.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.