Description Usage Arguments Details Value Author(s) References See Also
Create a frequency table either with a corpus or with a subcorpus. With a corpus, a frequency table is a based on two attributes (structural or positional). With a subcorpus object, a frequency table is based on two anchors (match, matchend, target, keyword) and a positional attribute for each anchor.
1 2 3 4 5 6 7 8 9 10 | cqp_ftable(x, ...)
## S3 method for class 'cqp_corpus'
cqp_ftable(x, attribute1, attribute2, attribute1.use.id = FALSE,
attribute2.use.id = FALSE, structural.attribute.unique.id = FALSE,
subcorpus = NULL, ...)
## S3 method for class 'cqp_subcorpus'
cqp_ftable(x, anchor1, attribute1,
anchor2, attribute2, cutoff = 0, ...)
|
x |
An rcqp object, created with |
attribute1 |
The attribute for the modalities of the first variable of the cross-tabulation. If |
attribute2 |
The attribute for the modalities of the second variable of the cross-tabulation. If |
attribute1.use.id |
If |
attribute2.use.id |
If attribute2 is a structural attribute and has values (see |
structural.attribute.unique.id |
Count tokens or ids. See details for more info. |
subcorpus |
Not implemented yet. |
anchor1 |
The anchor for individuals of the first variable, if |
anchor2 |
The anchor for individuals of the second variable, if |
cutoff |
Filter the frequency table. |
... |
Ignored. |
Some explanations for the structural.attribute.unique.id
option (see the vignette RcqpIntroduction).
Positional attributes (and structural attributes having values) are represented
with their string values rather than with ids. For positional
attributes, it is only a matter of presentation, since each id has its own
string; but for structural attributes having values, it may entail a different
counting, since these values are not unique: occurrences of phenomena belonging
to different structs are then counted together if two structs have the same
value. You can force the use of ids rather than string values with the
attribute1.use.id
and attribute2.use.id
options.
Counts are made on token basis, i.e. each token of the corpus is an
individual on which the two modalities (attributes) are considered. If you
use two structural attributes as arguments in cqp_ftable
,
and one of them does not have values, then the third column counts the number of
tokens. In the following example, each line gives
the length (in number of tokens, third column) of each sentence (second column)
in each novel represented by its title:
1 2 3 | c <- corpus("DICKENS");
f <- cqp_ftable(c, "novel_title", "s")
f[1:10,]
|
If both structural attributes have values, you may want to count the number of
times the modalities are cooccurring, rather than the total number of
tokens included in these cooccurrences. For that purpose, you can use the
structural.attribute.unique.id=TRUE
option. In the following
example, we count the number of time each head appears in each novel :
1 2 | f <- cqp_ftable(c, "novel_title", "pp_h", structural.attribute.unique.id=TRUE)
f[1:10,]
|
Here on the contrary, we count the total number of tokens in each prepositional phrase having a given head :
1 2 | f <- cqp_ftable(c, "novel_title", "pp_h")
f[1:10,]
|
A frequency table stored as a flat (3-column) dataframe : for each observed combination of modalities, the first column gives the modality in the first variable, the second column the modality in the second variable, and the third column the observed frequency of the cooccurrence.
Bernard Desgraupes - bernard.desgraupes@u-paris10.fr - University Paris-10.
Sylvain Loiseau - sylvain.loiseau@univ-paris13.fr - University Paris-13.
http://cwb.sourceforge.net/documentation.php
cqp_flist
,
cqp_kwic
,
subcorpus
.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.