compare_mir_terms: Compare count of terms associated with a miRNA name over...

Description Usage Arguments Details Value See Also

View source: R/compare_mir_terms.R

Description

Compare count of top terms associated with a miRNA name over various topics.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
compare_mir_terms(
  df,
  mir,
  top = 20,
  token = "words",
  ...,
  topic = NULL,
  shared = TRUE,
  normalize = TRUE,
  stopwords = stopwords_miretrieve,
  stopwords_ngram = TRUE,
  position = "dodge",
  col.mir = miRNA,
  col.abstract = Abstract,
  col.topic = Topic,
  col.pmid = PMID,
  title = NULL
)

Arguments

df

Data frame containing miRNA names, abstracts, topics, and PubMed-IDs.

mir

String. miRNA name of interest.

top

Integer. Number of top terms to plot.

token

String. Specifies how abstracts shall be split up. Taken from unnest_tokens() in the tidytext package: "Unit for tokenizing, or a custom tokenizing function. Built-in options are "words" (default), "characters", "character_shingles", "ngrams", "skip_ngrams", "sentences", "lines", "paragraphs", "regex", (...), and "ptb" (Penn Treebank). If a function, should take a character vector and return a list of character vectors of the same length."

...

Additional arguments for tokenization, if necessary.

topic

Character vector. Optional. Specifies topics to plot. If topic = NULL, all topics in df are plotted.

shared

Boolean. If shared = TRUE, only terms that are shared between all topics are plotted.

normalize

Boolean. If normalize = TRUE, normalizes the number of abstracts to the total number of abstracts with a miRNA name in a topic.

stopwords

Data frame containing stop words.

stopwords_ngram

Boolean. Specifies if stop words shall be removed from abstracts when using ngrams. Only applied when token = 'ngrams'.

position

Character vector. Vector containing either "dodge" or "facet". Determines if bar plots are on top of or next to each other.

col.mir

Symbol. Column containing miRNA names.

col.abstract

Symbol. Column containing abstracts.

col.topic

Symbol. Column containing topic names.

col.pmid

Symbol. Column containing PubMed-IDs.

title

String. Plot title.

Details

Compare count of top terms associated with a miRNA name over various topics. miRNA names and topics must be in a data frame df, while terms are taken from abstracts contained in df. Number of top terms to plot is regulated by top. Terms can either be evaluated as their raw count, e.g. in how many abstracts they are mentioned in conjunction with the miRNA name, or as their relative count, e.g. in how many abstracts containing the miRNA they are mentioned compared to all abstracts containing the miRNA. compare_mir_terms() is based on the tools available in the tidytext package.

Value

Bar plot comparing the count of terms associated with a miRNA name over two topics.

See Also

compare_mir_terms_log2(), compare_mir_terms_scatter()

Other compare functions: compare_mir_count_log2(), compare_mir_count_unique(), compare_mir_count(), compare_mir_terms_log2(), compare_mir_terms_scatter(), compare_mir_terms_unique()


JulFriedrich/miRetrieve documentation built on Sept. 20, 2021, 11:37 p.m.