knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
library(malaytextr)
There is a data frame of Malay root words that can be used as a dictionary:
head(malayrootwords)
stem_malay()
will find the root words in a dictionary, in which the malayrootwords
data frame can be used, then it will remove "extra suffix"", "prefix" and lastly "suffix"
To stem word "banyaknya". It will return a data frame with the word "banyaknya" and the stemmed word "banyak":
stem_malay(word = "banyaknya", dictionary = malayrootwords)
To stem words in a data frame:
x <- data.frame(text = c("banyaknya","sangat","terkedu", "pengetahuan")) stem_malay(word = x, dictionary = malayrootwords, col_feature1 = "text")
remove_url will remove all urls found in a string
x <- c("test https://t.co/fkQC2dXwnc", "another one https://www.google.com/ to try") remove_url(x)
There is a data frame of Malay stop words:
head(malaystopwords)
This lexicon includes words that have been labelled as positive or negative. This is useful for tasks like sentiment analysis, which involves determining the overall sentiment expressed in a piece of text. To use the lexicon, process the text and check each word against the lexicon to determine its sentiment. To note, this sentiment lexicon was created based on a general corpus, sourced from news articles
head(sentiment_general)
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.