Description Usage Arguments Value See Also Examples
Identifies higher level taxa for each sequence in clusters for given rank. Selects representative sequences for each unique taxon using the choose_by functions. By default, the function will choose the top ten sequences by first sorting by those with fewest number of ambiguous sequences, then by youngest, then by sequence length.
1 2 3 |
phylota |
Phylota object |
rnk |
Taxonomic rank |
keep_higher |
Keep higher taxonomic ranks? |
n |
Number of sequences per taxon |
choose_by |
Vector of selection functions |
greatest |
Greatest of lowest for each choose_by function |
phylota
Other tools-public: calc_mad
,
calc_wrdfrq
, drop_clstrs
,
drop_sqs
, get_clstr_slot
,
get_nsqs
, get_ntaxa
,
get_sq_slot
, get_stage_times
,
get_tx_slot
, get_txids
,
is_txid_in_clstr
,
is_txid_in_sq
,
list_clstrrec_slots
,
list_ncbi_ranks
,
list_seqrec_slots
,
list_taxrec_slots
,
plot_phylota_pa
,
plot_phylota_treemap
,
read_phylota
, write_sqs
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 | data("dragonflies")
# For faster computations, let's only work with the 5 clusters.
dragonflies <- drop_clstrs(phylota = dragonflies, cid = dragonflies@cids[10:15])
# We can use drop_by_rank() to reduce to 10 sequences per genus for each cluster
(reduced_1 <- drop_by_rank(phylota = dragonflies, rnk = 'genus', n = 10,
choose_by = c('pambgs', 'age', 'nncltds'),
greatest = c(FALSE, FALSE, TRUE)))
# We can specify what aspects of the sequences we would like to select per genus
# By default we select the sequences with fewest ambiguous nucleotides (e.g.
# we avoid Ns), the youngest age and then longest sequence.
# We can reverse the 'greatest' to get the opposite.
(reduced_2 <- drop_by_rank(phylota = dragonflies, rnk = 'genus', n = 10,
choose_by = c('pambgs', 'age', 'nncltds'),
greatest = c(TRUE, TRUE, FALSE)))
# Leading to smaller sequnces ...
r1_sqlngth <- mean(get_sq_slot(phylota = reduced_1,
sid = reduced_1@sids, slt_nm = 'nncltds'))
r2_sqlngth <- mean(get_sq_slot(phylota = reduced_2,
sid = reduced_2@sids, slt_nm = 'nncltds'))
(r1_sqlngth > r2_sqlngth)
# ... with more ambigous characters ....
r1_pambgs <- mean(get_sq_slot(phylota = reduced_1, sid = reduced_1@sids,
slt_nm = 'pambgs'))
r2_pambgs <- mean(get_sq_slot(phylota = reduced_2, sid = reduced_2@sids,
slt_nm = 'pambgs'))
(r1_pambgs < r2_pambgs)
# .... and older ages (measured in days since being added to GenBank).
r1_age <- mean(get_sq_slot(phylota = reduced_1, sid = reduced_1@sids,
slt_nm = 'age'))
r2_age <- mean(get_sq_slot(phylota = reduced_2, sid = reduced_2@sids,
slt_nm = 'age'))
(r1_age < r2_age)
# Or... we can simply reduce the clusters to just one sequence per genus
(dragonflies <- drop_by_rank(phylota = dragonflies, rnk = 'genus', n = 1))
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.