getCQL | R Documentation |
Queryting by "CQL" (Corpus Query Language) lets us search for patterns in the selected transcripts. We construct a CQL query by specifying a search pattern of words, lemmas, and parts of speech.
The "cqlArr" parameter specifies a pattern to search for in text. The pattern is built up by appending components that are one of three types:
Exact word match (type="word").
Match any form of a word (type="lemma").
Or part of speech (type="pos").
Along with type, components have another value, "freq", specifying how many times an item should appear at that location.
Appear once at that location (freq="once").
Appear zero or more times at location (freq="zeroPlus").
Appear one or more times at location (freq="onePlus").
We append these two-part (type/freq) components together to search for patterns across corpora.
Some examples:
To find all instances of exactly "go home": cqlArr=list( list(type="word", item="go", freq="once"), list(type="word", item="home", freq="once"))
This matches all utterances containing: "go home"
To find all instances of any form of "go" followed by "home", we use type="lemma" for "go": cqlArr=list( list(type="lemma", item="go", freq="once"), list(type="word", item="home", freq="once"))
This matches all utterances containing: "go home" "goes home" "went home" "going home"
To find all instances of a subject pronoun, followd by any form of "go", followed by one or more adverbs, followed by "home": cqlArr=list( list(type="pos", item="pro:sub", freq="once"), list(type="lemma", item="go", freq="once"), list(type="pos", item="adv", freq="onePlus"), list(type="word", item="home", freq="once"))
This matches all utterances containing: "they went back home" "they go back home" "he went back home" "we went back home" others...
There are many "item" values for part of speech (type="pos"). See the CHAT manual or the CQL tab on TalkBankDB (https://talkbank.org/DB) for legal part-of-speech codes.
getCQL(
cqlArr = NULL,
corpusName = NULL,
corpora = NULL,
lang = NULL,
media = NULL,
age = NULL,
gender = NULL,
designType = NULL,
activityType = NULL,
groupType = NULL,
auth = FALSE
)
cqlArr |
Query by grammatical pattern. For example, to search for all utterances where a speaker says "go" once followed by adverb occuring one or more times: cqlArr=list(list(type="word", item="go", freq="once"), list(type="pos", item="adv", freq="onePlus")). Legal values for type are: "word" to match exact word, "lemma" to match all forms of a word, "pos" to match parts of speech. Legal values for item are any word, word lemma, or part of speech code (see CHAT manual or the CQL tab on TalkBankDB (https://talkbank.org/DB) for legal part-of-speech codes). Legal values for freq are "once", "onePlus", and "zeroPlus". |
corpusName |
Name of corpus to query. For example, to search within the childes corpus, corpus="childes". |
corpora |
Name of corpus/corporas to query. This is a path starting with the corpus name followed by subfolder names leading to a folder for which all transcripts beneath it will be queried. For example, to query all transcripts in the MacWhinney childes corpus: corpora = c('childes', 'Eng-NA', 'MacWhinney'). |
lang |
Query by language For example, to get transcripts that contain both English and Spanish: lang=c("eng", "spa"). Legal values: 3-letter language codes based on the ISO 639-3 standard. |
media |
Query by media type. For example, to get transcripts with an associated video recording: media=c("video"). Legal values: "audio" or "video". |
age |
Query by participant month age range. For example, to get transcripts with target participants who are 14-18 months old: age=c(from="3", to="12"). |
gender |
Query by participant gender. For example, to get transcripts with female target participants: gender=c("female"). Legal values: "female" or "male". |
designType |
Query by design type. For example, to get transcripts from a longitudinal study: designType=c("long") Legal values are "long" for longitudinal studies, "cross" for cross-sectional studies. |
activityType |
Query by activity type. For example, to get transcripts where the target participant is engaged in toy play: activityType=c("toyplay"). See the CHAT manual for legal values. |
groupType |
Query by group type. For example, to get transcripts where the target participant is hearing limited: groupType=c("HL"). See the CHAT manual for legal values. |
auth |
Determine if user should be prompted to authenticate in order to access protected collections. Defaults to False. |
getCQL(cqlArr=list(list(type="lemma", item="my", freq="once"),
list(type="lemma", item="ball", freq="once")),
corpusName = 'childes',
corpora = c('childes', 'Eng-NA', 'MacWhinney'))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.