wordscluster: To cluster the words

Description Usage Arguments Details Value Note Author(s) See Also Examples

View source: R/wordscluster.R

Description

wordscluster is used to cluster the words, using the levenshtein distance concept, which are coming together in combination with either 'prefixes' or 'suffixes' or other compound words. The first word, usually of lowest length, could be 'stemmed' word in many cases drastically so, is considered as representative for that cluster.

Usage

1
wordscluster(lower, upper)

Arguments

lower

lower limit for characters in word. Default = 5.

upper

upper limit of characters in word. Default = 30

Details

This function is usefull for dampening the 'explotion' of words output from word_atomizations. This step enables easy examination of the terms.

Value

a list object of words clustered together and a text filenamed "resulttable.txt" with the columns cluster number, cluster size and representatives of clusters.

Note

The function may run faster when the lower limits are reduced but 'risks' producing plenty of 'decoy' situations. Their frequencies are very rare. Decoy situations: Some 'words' with part identity to other smaller words will runaway with smaller words. This event creates an unfavorable situation whereby the generated 'clusters' of words become difficult to interpret. This situation can be minimized by increasing the lower limit of word length, however at the cost of lowering computational speed. An example is: the word hypercholesterolemia runsaway with the smaller word 'lester' which could be another name.In this instance increasing the lower limit will be more usefull. Words longer than 30 characters are usually names of chemical comnpunds in IUPAC system of nomenclature.

Author(s)

S.Ramachandran, Jyoti Rani

See Also

whichcluster word_atomizations

Examples

1
2
3
4
5
6
## Not run: 
test=wordscluster(5, 10)
## here it will start making cluster of words of length with minimum of 5 characters 
## and maximum of 10 characters.  

## End(Not run)

pubmed.mineR documentation built on Nov. 26, 2021, 5:11 p.m.