Home

/

GitHub

/

browndw/pseudobibeR

/

biber: Extract Biber features from a document parsed and annotated...

biber: Extract Biber features from a document parsed and annotated...
In browndw/pseudobibeR: Aggregate Counts of Linguistic Features

View source: R/parse_functions.R

biber

R Documentation

Extract Biber features from a document parsed and annotated by spacyr or udpipe

Description

Takes data that has been part-of-speech tagged and dependency parsed and extracts counts of features that have been used in Douglas Biber's research since the late 1980s.

Usage

biber(
  tokens,
  measure = c("MATTR", "TTR", "CTTR", "MSTTR", "none"),
  normalize = TRUE
)

## S3 method for class 'spacyr_parsed'
biber(
  tokens,
  measure = c("MATTR", "TTR", "CTTR", "MSTTR", "none"),
  normalize = TRUE
)

## S3 method for class 'udpipe_connlu'
biber(
  tokens,
  measure = c("MATTR", "TTR", "CTTR", "MSTTR", "none"),
  normalize = TRUE
)

Arguments

`tokens`	A dataset of tokens created by `spacyr::spacy_parse()` or `udpipe::udpipe_annotate()`
`measure`	Measure to use for type-token ratio. Passed to `quanteda.textstats::textstat_lexdiv()` to calculate the statistic. Can be the Moving Average Type-Token Ratio (MATTR), ordinary Type-Token Ratio (TTR), corrected TTR (CTTR), Mean Segmental Type-Token Ratio (MSTTR), or `"none"` to skip calculating a type-token ratio. If a statistic is chosen but there are fewer than 200 token in the smallest document, the TTR is used instead.
`normalize`	If `TRUE`, count features are normalized to the rate per 1,000 tokens.

Details

Refer to spacyr::spacy_parse() or udpipe::udpipe_annotate() for details on parsing texts. These must be configured to do part-of-speech and dependency parsing. For spacyr::spacy_parse(), use the dependency = TRUE, tag = TRUE, and pos = TRUE arguments; for udpipe::udpipe_annotate(), set the tagger and parser arguments to "default".

Feature extraction relies on a dictionary (included as dict) and word lists (word_lists) to match specific features; see their documentation and values for details on the exact patterns and words matched by each. The function identifies other features based on local cues, which are approximations. Because they rely on probabilistic taggers provided by spaCy or udpipe, the accuracy of the resulting counts are dependent on the accuracy of those models. Thus, texts with irregular spellings, non-normative punctuation, etc. will likely produce unreliable outputs, unless taggers are tuned specifically for those purposes.

The following features are detected. Square brackets in example sentences indicate the location of the feature.

Tense and aspect markers

f_01_past_tense: Verbs in the past tense.
f_02_perfect_aspect: Verbs in the perfect aspect, indicated by "have" as an auxiliary verb (e.g. I [have] written this sentence.)"
f_03_present_tense: Verbs in the present tense.

Place and time adverbials

f_04_place_adverbials: Place adverbials (e.g., above, beside, outdoors; see list in dict$f_04_place_adverbials)
f_05_time_adverbials: Time adverbials (e.g., early, instantly, soon; see dict$f_05_time_adverbials)

Pronouns and pro-verbs

f_06_first_person_pronouns: First-person pronouns; see dict$f_06_first_person_pronouns
f_07_second_person_pronouns: Second-person pronouns; see dict$f_07_second_person_pronouns
f_08_third_person_pronouns: Third-person personal pronouns (excluding it); see dict$f_08_third_person_pronouns
f_09_pronoun_it: Pronoun it, its, or itself
f_10_demonstrative_pronoun: Pronouns being used to replace a noun (e.g. [That] is an example sentence.)
f_11_indefinite_pronouns: Indefinite pronouns (e.g., anybody, nothing, someone; see dict$f_11_indefinite_pronouns)
f_12_proverb_do: Pro-verb do

Questions

f_13_wh_question: Direct wh- questions (e.g., When are you leaving?)

Nominal forms

f_14_nominalizations: Nominalizations (nouns ending in -tion, -ment, -ness, -ity)
f_15_gerunds: Gerunds (participial forms functioning as nouns)
f_16_other_nouns: Total other nouns

Passives

f_17_agentless_passives: Agentless passives (e.g., The task [was done].)
f_18_by_passives: by- passives (e.g., The task [was done by Steve].)

Stative forms

f_19_be_main_verb: be as main verb
f_20_existential_there: Existential there (e.g., [There] is a feature in this sentence.)

Subordination features

f_21_that_verb_comp: that verb complements (e.g., I said [that he went].)
f_22_that_adj_comp: that adjective complements (e.g., I'm glad [that you like it].)
f_23_wh_clause: wh- clauses (e.g., I believed [what he told me].)
f_24_infinitives: Infinitives
f_25_present_participle: Present participial adverbial clauses (e.g., [Stuffing his mouth with cookies], Joe ran out the door.)
f_26_past_participle: Past participial adverbial clauses (e.g., [Built in a single week], the house would stand for fifty years.)
f_27_past_participle_whiz: Past participial postnominal (reduced relative) clauses (e.g., the solution [produced by this process])
f_28_present_participle_whiz: Present participial postnominal (reduced relative) clauses (e.g., the event [causing this decline])
f_29_that_subj: that relative clauses on subject position (e.g., the dog [that bit me])
f_30_that_obj: that relative clauses on object position (e.g., the dog [that I saw])
f_31_wh_subj: wh- relatives on subject position (e.g., the man [who likes popcorn])
f_32_wh_obj: wh- relatives on object position (e.g., the man [who Sally likes])
f_33_pied_piping: Pied-piping relative clauses (e.g., the manner [in which he was told])
f_34_sentence_relatives: Sentence relatives (e.g., Bob likes fried mangoes, [which is the most disgusting thing I've ever heard of].)
f_35_because: Causative adverbial subordinator (because)
f_36_though: Concessive adverbial subordinators (although, though)
f_37_if: Conditional adverbial subordinators (if, unless)
f_38_other_adv_sub: Other adverbial subordinators (e.g., since, while, whereas)

Prepositional phrases, adjectives, and adverbs

f_39_prepositions: Total prepositional phrases
f_40_adj_attr: Attributive adjectives (e.g., the [big] horse)
f_41_adj_pred: Predicative adjectives (e.g., The horse is [big].)
f_42_adverbs: Total adverbs

Lexical specificity

f_43_type_token: Type-token ratio (including punctuation), using the statistic chosen in measure, or TTR if there are fewer than 200 tokens in the smallest document.
f_44_mean_word_length: Average word length (across tokens, excluding punctuation)

Lexical classes

f_45_conjuncts: Conjuncts (e.g., consequently, furthermore, however; see dict$f_45_conjuncts)
f_46_downtoners: Downtoners (e.g., barely, nearly, slightly; see dict$f_46_downtoners)
f_47_hedges: Hedges (e.g., at about, something like, almost; see dict$f_47_hedges)
f_48_amplifiers: Amplifiers (e.g., absolutely, extremely, perfectly; see dict$f_48_amplifiers)
f_49_emphatics: Emphatics (e.g., a lot, for sure, really; see dict$f_49_emphatics)
f_50_discourse_particles: Discourse particles (e.g., sentence-initial well, now, anyway; see dict$f_50_discourse_particles)
f_51_demonstratives: Demonstratives (that, this, these, or those used as determiners, e.g. [That] is the feature)

Modals

f_52_modal_possibility: Possibility modals (can, may, might, could)
f_53_modal_necessity: Necessity modals (ought, should, must)
f_54_modal_predictive: Predictive modals (will, would, shall)

Specialized verb classes

f_55_verb_public: Public verbs (e.g., assert, declare, mention; see dict$f_55_verb_public)
f_56_verb_private: Private verbs (e.g., assume, believe, doubt, know; see dict$f_56_verb_private)
f_57_verb_suasive: Suasive verbs (e.g., command, insist, propose; see dict$f_57_verb_suasive)
f_58_verb_seem: seem and appear

Reduced forms and dispreferred structures

f_59_contractions: Contractions
f_60_that_deletion: Subordinator that deletion (e.g., I think [he went].)
f_61_stranded_preposition: Stranded prepositions (e.g., the candidate that I was thinking [of])
f_62_split_infinitive: Split infinitives (e.g., He wants [to convincingly prove] that ...)
f_63_split_auxiliary: Split auxiliaries (e.g., They [were apparently shown] to ...)

Co-ordination

f_64_phrasal_coordination: Phrasal co-ordination (N and N; Adj and Adj; V and V; Adv and Adv)
f_65_clausal_coordination: Independent clause co-ordination (clause-initial and)

Negation

f_66_neg_synthetic: Synthetic negation (e.g., No answer is good enough for Jones.)
f_67_neg_analytic: Analytic negation (e.g., That isn't good enough.)

Value

A data.frame of features containing one row per document and one column per feature. If normalize is TRUE, count features are normalized to the rate per 1,000 tokens.

References

Biber, Douglas (1985). "Investigating macroscopic textual variation through multifeature/multidimensional analyses." Linguistics 23(2), 337-360. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1515/ling.1985.23.2.337")}

Biber, Douglas (1988). Variation across Speech and Writing. Cambridge University Press.

Biber, Douglas (1995). Dimensions of Register Variation: A Cross-Linguistic Comparison. Cambridge University Press.

Covington, M. A., & McFall, J. D. (2010). Cutting the Gordian Knot: The Moving-Average Type–Token Ratio (MATTR). Journal of Quantitative Linguistics, 17(2), 94–100. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1080/09296171003643098")}

Examples

# Parse the example documents provided with the package
biber(udpipe_samples)

biber(spacy_samples)

browndw/pseudobibeR documentation built on Sept. 14, 2024, 4:34 a.m.

browndw/pseudobibeR index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

browndw/pseudobibeR
Aggregate Counts of Linguistic Features

biber: Extract Biber features from a document parsed and annotated...
In browndw/pseudobibeR: Aggregate Counts of Linguistic Features

Extract Biber features from a document parsed and annotated by spacyr or udpipe

Description

Usage

Arguments

Details

Tense and aspect markers

Place and time adverbials

Pronouns and pro-verbs

Questions

Nominal forms

Passives

Stative forms

Subordination features

Prepositional phrases, adjectives, and adverbs

Lexical specificity

Lexical classes

Modals

Specialized verb classes

Reduced forms and dispreferred structures

Co-ordination

Negation

Value

References

See Also

Examples

Related to biber in browndw/pseudobibeR...

R Package Documentation

Browse R Packages

We want your feedback!

browndw/pseudobibeR Aggregate Counts of Linguistic Features

biber: Extract Biber features from a document parsed and annotated... In browndw/pseudobibeR: Aggregate Counts of Linguistic Features

Extract Biber features from a document parsed and annotated by spacyr or udpipe

Description

Usage

Arguments

Details

Tense and aspect markers

Place and time adverbials

Pronouns and pro-verbs

Questions

Nominal forms

Passives

Stative forms

Subordination features

Prepositional phrases, adjectives, and adverbs

Lexical specificity

Lexical classes

Modals

Specialized verb classes

Reduced forms and dispreferred structures

Co-ordination

Negation

Value

References

See Also

Examples

Related to biber in browndw/pseudobibeR...

R Package Documentation

Browse R Packages

We want your feedback!

browndw/pseudobibeR
Aggregate Counts of Linguistic Features

biber: Extract Biber features from a document parsed and annotated...
In browndw/pseudobibeR: Aggregate Counts of Linguistic Features