cqi_fdist: Frequency distributions

Description Usage Arguments Details Value Author(s) Source References See Also Examples

Description

Calculate a frequency list or a cross-tabulated frequency table.

Usage

1
2
	cqi_fdist1(subcorpus, field1, key1, cutoff=0, offset=0)
	cqi_fdist2(subcorpus, field1, key1, field2, key2, cutoff=0)

Arguments

subcorpus

(string) qualified name of a subcorpus.

field1

(string) the name of the anchor. It can be one of : 'match', 'matchend', 'target', 'keyword'.

key1

(string) the name of a positional attribute.

field2

(string) the name of a second anchor. It can be one of : 'match', 'matchend', 'target', 'keyword'.

key2

(string) the name of a positional attribute for the second anchor.

cutoff

(integer) a floor value under which results are not displayed. Default value is 0.

offset

(integer) an offset relative to the specified anchor.

Details

cqi_fdist1 builds a frequency list given individuals (occurrences) and modalities (a positional attribute).

The occurrences are defined by providing one of the anchors of a query ('match', 'matchend', 'target', 'keyword').

The results are sorted in decreasing order of frequency. The cut argument specifies a value under which the results will not be returned. For instance, if the value is set to 10, only items with a frequency greater than or equal to 10 are returned. The default value of this argument is 0 which means that all the frequencies are returned by default.

The offset argument lets specify a position relative to the anchor specified by the field argument. For instance, if field is set to 'match' and offset is equal to -1, the frequency list is computed on all the tokens located before the match anchor. The default value of this argument is 0.

cqi_fdist2 builds a frequency table of the values found in one anchor (such as 'match', 'matchend', 'target', 'keyword') cross-tabulated with the values found in another anchor. In other words, it gives the frequency of every different co-occurrences found according to the two given anchors.

Value

cqi_fdist1 returns a matrix with two columns. The first column contains the IDs of the attributes and the second column the corresponding frequency (number of occurrences).

cqi_fdist2 returns a matrix with three columns. The first column contains the IDs of an attribute of the occurrences found at the first anchor, the second column contains the IDs of an attribute of the occurrences found in the second anchor and the third column gives the frequency of the co-occurrences.

Author(s)

Bernard Desgraupes - bernard.desgraupes@u-paris10.fr - University Paris-10.
Sylvain Loiseau - sylvain.loiseau@univ-paris13.fr - University Paris-13.

Source

The IMS Open Corpus Workbench (CWB) at http://cwb.sourceforge.net/

References

http://cwb.sourceforge.net/documentation.php

See Also

cqi_list_subcorpora, cqi_dump_subcorpus.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
## Not run: 
cqi_query("DICKENS","Go","[lemma=\"go\"] \"and\" [];")
m <- cqi_fdist1("DICKENS:Go","matchend","pos")
cqi_id2str("DICKENS.pos", m[,1])

cqi_query("DICKENS","NP","[pos=\"DT\"] @[pos=\"JJ\"]? [pos=\"NNS?\"];")
cqi_fdist1("DICKENS:NP","target","lemma",300)
cqi_fdist1("DICKENS:NP","match","lemma", cutoff=2000, offset=-1)

cqi_fdist2("DICKENS:Go","matchend", "pos", "matchend","lemma")
cqi_fdist2("DICKENS:NP","target", "lemma", "matchend","word", cutoff=300)

## End(Not run)

PolMine/rcqp.mac documentation built on May 28, 2019, 2:24 p.m.