RefineChars: Removes all characters that are not Latin, Persian or...

Description Usage Arguments Value Author(s) Examples

Description

Removes all unicode characters except Latin, Persian or General Punctuation characters and standardizes Persian characters.

Usage

1
RefineChars(texts)

Arguments

texts

A string from which all characters that are not Latin, Persian or punctuation should be removed, or in which Persian characters should be standardized.

Value

RefineChars returns a string with only Latin, standardized Persian or general punctuation characters.

Author(s)

Safshekan, Nielsen

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
# Create string with Latin, Persian, Japanese, non-standardized Persian and punctuation characters.
x <- '\u062F\u0627\u0646\u0634\u06AF\u0627\u0647\u064A \u060C 
\u0641\u06CC\u0632\u06CC\u0643 university 
\u65E5\u672C \u0664\u0665\u0666'

# Remove new line characters and fixe half-spaces from a string.
x <- RemNewlineHalfspace(x)

# Remove all characters that are not Latin, Persian or punctuation, 
# and standardize Persian characters.
RefineChars(x)

PersianStemmer documentation built on June 28, 2019, 5:03 p.m.