specificities: Calculate Lexical Specificity Score

Description

Calculate the specificity - or association or surprise - score of a word being present f times or more in a sub-corpus of t words given that it appears a total of F times in a whole corpus of T words.

Usage

1
specificities(lexicaltable, types=NULL, parts=NULL)

Arguments

lexicaltable

a complete lexical table, i.e. a numeric matrix where each line represents a word and each column a part of the corpus. Each cell gives the frequency of the given word in the corresponding part of the corpus.

types

list of rows (words) for which the specificity score must be calculated. If NULL, the specificity score is calculated for every row; If types is a character vector, it indicates the row names for which the specificity score is to be calculated (an error is thrown if lexicaltable has no row names); If types is an integer vector, it indicates the index of rows for which the specificity score is to be calculated.

parts

list of columns (parts) for which the specificity score must be calculated. If NULL, the specificity index is calculated for every part; If parts is a character vector, it indicates the column names for which the specificity score is to be calculated (an error is thrown if lexicaltable has no column names); If parts is an integer vector, it indicates the index of columns for which the specificity score is to be calculated.

Value

Returns a matrix of nrow(lexicaltable) * ncol(lexicaltable) (the number of rows and columns may be reduced using types or parts), each cell giving the specificity score.

Author(s)

Matthieu Decorde, Serge Heiden, Sylvain Loiseau, Lise Vaudor

References

Lafon P. (1980) Sur la variabilit\'e de la fr\'e quence des formes dans un corpus, Mots, 1, pp. 127–165. http://www.persee.fr/web/revues/home/prescript/article/mots_0243-6450_1980_num_1_1_1008

See Also

specificities.probabilities, specificities.lexicon

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
data(robespierre);
spe <- specificities(robespierre);
string <- paste("The word %s appears f=%d times in a sub-corpus of t=%d words,",
" given a total frequency of F=%d in the robespierre corpus made",
" of T=%d words. The corresponding specificity score is %f", sep="");
print(sprintf(string,
'peuple',
robespierre['peuple','D4'],
colSums(robespierre)['D4'],
rowSums(robespierre)['peuple'],
sum(robespierre),
spe['peuple', 'D4']));

Questions? Problems? Suggestions? or email at ian@mutexlabs.com.

All documentation is copyright its authors; we didn't write any of that.