compare_mir_terms_log2: Compare log2-frequency count of terms associated with a miRNA...
In JulFriedrich/miRetrieve: miRNA Text Mining in Abstracts

Description Usage Arguments Details Value References See Also

Compare log2-frequency count of terms associated with a miRNA name over two topics.

compare_mir_terms_log2(
  df,
  mir,
  top = 20,
  token = "words",
  ...,
  topic = NULL,
  shared = TRUE,
  normalize = TRUE,
  stopwords = stopwords_miretrieve,
  stopwords_ngram = TRUE,
  col.mir = miRNA,
  col.abstract = Abstract,
  col.topic = Topic,
  col.pmid = PMID,
  title = NULL
)

`df`	Data frame containing miRNA names, abstracts, topics, and PubMed-IDs.
`mir`	String. miRNA name of interest.
`top`	Integer. Number of top terms to plot.
`token`	String. Specifies how abstracts shall be split up. Taken from `unnest_tokens()` in the tidytext package: "Unit for tokenizing, or a custom tokenizing function. Built-in options are "words" (default), "characters", "character_shingles", "ngrams", "skip_ngrams", "sentences", "lines", "paragraphs", "regex", (...), and "ptb" (Penn Treebank). If a function, should take a character vector and return a list of character vectors of the same length."
`...`	Additional arguments for tokenization, if necessary.
`topic`	Character vector. Optional. Specifies which topics to plot. Must have length two. If `topic = NULL`, all topics in `df` are plotted.
`shared`	Boolean. If `shared = TRUE`, only terms that are shared between the two topics are plotted.
`normalize`	Boolean. If `normalize = TRUE`, normalizes the number of abstracts to the total number of abstracts in a topic.
`stopwords`	Data frame containing stop words.
`stopwords_ngram`	Boolean. Specifies if stop words shall be removed from abstracts when using ngrams. Only applied when `token = 'ngrams'`.
`col.mir`	Symbol. Column containing miRNA names.
`col.abstract`	Symbol. Column containing abstracts.
`col.topic`	Symbol. Column containing topic names.
`col.pmid`	Symbol. Column containing PubMed-IDs.
`title`	String. Plot title.

Compare log2-frequency count of terms associated with a miRNA name over two topics by plotting the log2-ratio of the term count associated with a miRNA name over two topics. miRNA names and topics must be in a data frame df, while terms are taken from abstracts contained in df. Number of top terms to plot is regulated by top. Terms can either be evaluated as their raw count, e.g. in how many abstracts they are mentioned in conjunction with the miRNA name, or as their relative count, e.g. in how many abstracts containing the miRNA they are mentioned compared to all abstracts containing the miRNA. compare_mir_terms_log2() is based on the tools available in the tidytext package. The log2-plot is greatly inspired by the book “tidytext: Text Mining and Analysis Using Tidy Data Principles in R.” by Silge and Robinson.

List containing bar plot comparing the log2-frequency of terms associated with a miRNA over two topics and its corresponding data frame.

Silge, Julia, and David Robinson. 2016. “tidytext: Text Mining and Analysis Using Tidy Data Principles in R.” JOSS 1 (3). The Open Journal. https://doi.org/10.21105/joss.00037.

compare_mir_terms(), compare_mir_terms_scatter()

Other compare functions: compare_mir_count_log2(), compare_mir_count_unique(), compare_mir_count(), compare_mir_terms_scatter(), compare_mir_terms_unique(), compare_mir_terms()