calc_wrdfrq: Calculate word frequencies

Description Usage Arguments Details Value See Also Examples

Description

For all sequences in a cluster(s) calculate the frequency of separate words in either the sequence definitions or the reported feature name.

Usage

1
2
calc_wrdfrq(phylota, cid, min_frq = 0.1, min_nchar = 1, type = c("dfln",
  "nm"), ignr_pttrn = "[^a-z0-9]")

Arguments

phylota

Phylota object

cid

Cluster ID(s)

min_frq

Minimum frequency

min_nchar

Minimum number of characters for a word

type

Definitions (dfln) or features (nm)

ignr_pttrn

Ignore pattern, REGEX for text to ignore.

Details

By default, anything that is not alphanumeric is ignored. 'dfln' and 'nm' match the slot names in a SeqRec, see list_seqrec_slots().

Value

list

See Also

Other tools-public: calc_mad, drop_by_rank, drop_clstrs, drop_sqs, get_clstr_slot, get_nsqs, get_ntaxa, get_sq_slot, get_stage_times, get_tx_slot, get_txids, is_txid_in_clstr, is_txid_in_sq, list_clstrrec_slots, list_ncbi_ranks, list_seqrec_slots, list_taxrec_slots, plot_phylota_pa, plot_phylota_treemap, read_phylota, write_sqs

Examples

1
2
3
4
5
6
7
data('dragonflies')
# work out what gene region the cluster is likely representing with word freqs.
random_cids <- sample(dragonflies@cids, 10)
# most frequent words in definition line
(calc_wrdfrq(phylota = dragonflies, cid = random_cids, type = 'dfln'))
# most frequent words in feature name
(calc_wrdfrq(phylota = dragonflies, cid = random_cids, type = 'nm'))

phylotaR documentation built on May 1, 2019, 9:26 p.m.