removePrefixes | R Documentation |
Removes some Arabic prefixes from a unicode string. The prefixes are: "waw", "alif-lam", "waw-alif-lam", "ba-alif-lam", "kaf-alif-lam", "fa-alif-lam", and "lam-lam." Prefixes are removed from a word (as defined by spaces) only if the remaining stem would not be too short.
removePrefixes(texts, x1 = 4, x2 = 4, x3 = 5, x4 = 5, x5 = 5, x6 = 5, x7 = 4, dontstem = c('\u0627\u0644\u0644\u0647','u0644\u0644\u0647'))
texts |
An Arabic-language string in unicode |
x1 |
The number of letters that must be in a word for the function to remove the prefix "waw". |
x2 |
The number of letters that must be in a word for the function to remove the prefix "alif-lam". |
x3 |
The number of letters that must be in a word for the function to remove the prefix "waw-alif-lam". |
x4 |
The number of letters that must be in a word for the function to remove the prefix "ba-alif-lam". |
x5 |
The number of letters that must be in a word for the function to remove the prefix "kaf-alif-lam". |
x6 |
The number of letters that must be in a word for the function to remove the prefix "fa-alif-lam". |
x7 |
The number of letters that must be in a word for the function to remove the prefix "lam-lam". |
dontstem |
Words that should not be stemmed (entered in unicode). |
Returns a string with Arabic prefixes removed.
Rich Nielsen
## Create string with Arabic characters x <- '\u0627\u0644\u0644\u063a\u0629 \u0627\u0644\u0639\u0631\u0628\u064a\u0629 \u062c\u0645\u064a\u0644\u0629 \u062c\u062f\u0627' # Remove Prefixes removePrefixes(x)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.