calc_wrdfrq: Calculate word frequencies

View source: R/user-calc.R

calc_wrdfrqR Documentation

Calculate word frequencies

Description

For all sequences in a cluster(s) calculate the frequency of separate words in either the sequence definitions or the reported feature name.

Usage

calc_wrdfrq(
  phylota,
  cid,
  min_frq = 0.1,
  min_nchar = 1,
  type = c("dfln", "nm"),
  ignr_pttrn = "[^a-z0-9]"
)

Arguments

phylota

Phylota object

cid

Cluster ID(s)

min_frq

Minimum frequency

min_nchar

Minimum number of characters for a word

type

Definitions (dfln) or features (nm)

ignr_pttrn

Ignore pattern, REGEX for text to ignore.

Details

By default, anything that is not alphanumeric is ignored. 'dfln' and 'nm' match the slot names in a SeqRec, see list_seqrec_slots().

Value

list

See Also

Other tools-public: calc_mad(), drop_by_rank(), drop_clstrs(), drop_sqs(), get_clstr_slot(), get_nsqs(), get_ntaxa(), get_sq_slot(), get_stage_times(), get_tx_slot(), get_txids(), is_txid_in_clstr(), is_txid_in_sq(), list_clstrrec_slots(), list_ncbi_ranks(), list_seqrec_slots(), list_taxrec_slots(), plot_phylota_pa(), plot_phylota_treemap(), read_phylota(), write_sqs()

Examples

data('dragonflies')
# work out what gene region the cluster is likely representing with word freqs.
random_cids <- sample(dragonflies@cids, 10)
# most frequent words in definition line
(calc_wrdfrq(phylota = dragonflies, cid = random_cids, type = 'dfln'))
# most frequent words in feature name
(calc_wrdfrq(phylota = dragonflies, cid = random_cids, type = 'nm'))

ropensci/phylotaR documentation built on July 21, 2024, 1:01 a.m.