getCQL: Query CQL

getCQLR Documentation

Query CQL

Description

Queryting by "CQL" (Corpus Query Language) lets us search for patterns in the selected transcripts. We construct a CQL query by specifying a search pattern of words, lemmas, and parts of speech.

The "cqlArr" parameter specifies a pattern to search for in text. The pattern is built up by appending components that are one of three types:

  • Exact word match (type="word").

  • Match any form of a word (type="lemma").

  • Or part of speech (type="pos").

Along with type, components have another value, "freq", specifying how many times an item should appear at that location.

  • Appear once at that location (freq="once").

  • Appear zero or more times at location (freq="zeroPlus").

  • Appear one or more times at location (freq="onePlus").

We append these two-part (type/freq) components together to search for patterns across corpora.

Some examples:

  • To find all instances of exactly "go home": cqlArr=list( list(type="word", item="go", freq="once"), list(type="word", item="home", freq="once"))

This matches all utterances containing: "go home"

  • To find all instances of any form of "go" followed by "home", we use type="lemma" for "go": cqlArr=list( list(type="lemma", item="go", freq="once"), list(type="word", item="home", freq="once"))

This matches all utterances containing: "go home" "goes home" "went home" "going home"

  • To find all instances of a subject pronoun, followd by any form of "go", followed by one or more adverbs, followed by "home": cqlArr=list( list(type="pos", item="pro:sub", freq="once"), list(type="lemma", item="go", freq="once"), list(type="pos", item="adv", freq="onePlus"), list(type="word", item="home", freq="once"))

This matches all utterances containing: "they went back home" "they go back home" "he went back home" "we went back home" others...

There are many "item" values for part of speech (type="pos"). See the CHAT manual or the CQL tab on TalkBankDB (https://talkbank.org/DB) for legal part-of-speech codes.

Usage

getCQL(
  cqlArr = NULL,
  corpusName = NULL,
  corpora = NULL,
  lang = NULL,
  media = NULL,
  age = NULL,
  gender = NULL,
  designType = NULL,
  activityType = NULL,
  groupType = NULL,
  auth = FALSE
)

Arguments

cqlArr

Query by grammatical pattern. For example, to search for all utterances where a speaker says "go" once followed by adverb occuring one or more times: cqlArr=list(list(type="word", item="go", freq="once"), list(type="pos", item="adv", freq="onePlus")). Legal values for type are: "word" to match exact word, "lemma" to match all forms of a word, "pos" to match parts of speech. Legal values for item are any word, word lemma, or part of speech code (see CHAT manual or the CQL tab on TalkBankDB (https://talkbank.org/DB) for legal part-of-speech codes). Legal values for freq are "once", "onePlus", and "zeroPlus".

corpusName

Name of corpus to query. For example, to search within the childes corpus, corpus="childes".

corpora

Name of corpus/corporas to query. This is a path starting with the corpus name followed by subfolder names leading to a folder for which all transcripts beneath it will be queried. For example, to query all transcripts in the MacWhinney childes corpus: corpora = c('childes', 'Eng-NA', 'MacWhinney').

lang

Query by language For example, to get transcripts that contain both English and Spanish: lang=c("eng", "spa"). Legal values: 3-letter language codes based on the ISO 639-3 standard.

media

Query by media type. For example, to get transcripts with an associated video recording: media=c("video"). Legal values: "audio" or "video".

age

Query by participant month age range. For example, to get transcripts with target participants who are 14-18 months old: age=c(from="3", to="12").

gender

Query by participant gender. For example, to get transcripts with female target participants: gender=c("female"). Legal values: "female" or "male".

designType

Query by design type. For example, to get transcripts from a longitudinal study: designType=c("long") Legal values are "long" for longitudinal studies, "cross" for cross-sectional studies.

activityType

Query by activity type. For example, to get transcripts where the target participant is engaged in toy play: activityType=c("toyplay"). See the CHAT manual for legal values.

groupType

Query by group type. For example, to get transcripts where the target participant is hearing limited: groupType=c("HL"). See the CHAT manual for legal values.

auth

Determine if user should be prompted to authenticate in order to access protected collections. Defaults to False.

Examples

getCQL(cqlArr=list(list(type="lemma", item="my", freq="once"),
                   list(type="lemma", item="ball", freq="once")),
       corpusName = 'childes',
       corpora = c('childes', 'Eng-NA', 'MacWhinney'))

TalkBank/TBDBr documentation built on Feb. 4, 2024, 2:25 p.m.