skipgram_identify | R Documentation |
Identifies words which appear near each other in the free-text variable
(var
), referred to as "Skipgrams". Supported languages for stop words
and stemming are danish
, dutch
, english
, finnish
,
french
, german
, hungarian
, italian
,
norwegian
, portuguese
, russian
, spanish
, and
swedish
.
skipgram_identify(
x,
ids,
num_of_words = 2,
max_interrupt_words = 2,
words_to_rm = NULL,
lan = "english"
)
x |
Free-text character vector to query. |
ids |
Character vector containing IDs for each element of |
num_of_words |
Number of words to consider for each returned skipgram. Default = 2. |
max_interrupt_words |
Maximum number of words which can interrupt proximal words. Default = 2. |
words_to_rm |
Character vector of words which should not be considered. |
lan |
Language of |
Tibble containing skipgrams as variables and patient values as rows.
Guthrie, D., Allison, B., Liu, W., Guthrie, L. & Wilks, Y. A Closer Look at Skip-gram Modelling. in Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06) (European Language Resources Association (ELRA), 2006).
Benoit K, Watanabe K, Wang H, Nulty P, Obeng A, Müller S, Matsuo A (2018). “quanteda: An R package for the quantitative analysis of textual data.” _Journal of Open Source Software_, *3*(30), 774. doi:10.21105/joss.00774 <https://doi.org/10.21105/joss.00774>, <https://quanteda.io>.
Feinerer I, Hornik K (2020). _tm: Text Mining Package_. R package version 0.7-8, <https://CRAN.R-project.org/package=tm>.
Ingo Feinerer, Kurt Hornik, and David Meyer (2008). Text Mining Infrastructure in R. Journal of Statistical Software 25(5): 1-54. URL: https://www.jstatsoft.org/v25/i05/.
Principle underlying function: tokens_ngrams
Other free text functions:
extract_freetext()
,
skipgram_append()
,
skipgram_freq()
data(example_data)
skipgram_identify(x = example_data$free_text,
ids = example_data$patient_id,
max_interrupt_words = 5)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.