PerStem: Persian Stemmer for Text Analysis

Description Usage Arguments Details Value Author(s) Examples

Description

Stems Persian texts for text analysis.

Usage

1
2
3
4
5
PerStem(dat, NoEnglish = TRUE, NoNumbers = TRUE, 
	NoStopwords = TRUE, NoPunctuation = TRUE, 
	StemVerbs = TRUE, NoPreSuffix = TRUE, 
	Context = TRUE, StemBrokenPlurals = TRUE, 
	Transliteration = TRUE)

Arguments

dat

The original data.

NoEnglish

Removes English characters.

NoNumbers

Removes numbers.

NoStopwords

Removes stopwords by using the default stopword list.

NoPunctuation

If TRUE the function removes punctuation. If FALSE, it fixes punctuation for text analysis.

StemVerbs

Performs stemming on verbs and returns past or present root of the verb.

NoPreSuffix

Performs stemming by removing prefixes and suffixes.

Context

If TRUE, the function performs stemming on a word only if its stem exists in text. If FALSE, the function performs stemming without considering other words in text.

StemBrokenPlurals

Performs stemming on Arabic broken plurals and return singulars by using the default Arabic broken plurals list.

Transliteration

Transliterates Persian unicode characters into Latin characters using a transliteration system developed by Roozbeh Safshekan and Rich Nielsen.

Details

PerStem prepares texts in Persian for text analysis by stemming.

Value

PerStem returns the stemmed Persian text.

Author(s)

Roozbeh Safshekan, Richard Nielsen

Examples

1
2
3
4
5
6
7
8
# Load data
data(UniversityofTehran)

# Stem and transliterate the text
PerStem(UniversityofTehran,NoEnglish=TRUE, NoNumbers= TRUE, 
          NoStopwords=TRUE, NoPunctuation= TRUE,
          StemVerbs = TRUE, NoPreSuffix= TRUE, Context = TRUE,
          StemBrokenPlurals=TRUE,Transliteration= TRUE)

Example output

[1] "AWnaii danWGah thran tariKCh aftKarat Exim baWkvh tariK tmdn frhnG airan zmin bianGr vJvd mrkz danW mEarf mTalEh tHQiQ anvaE Elm bvd mrkz brJsth Elm hmCvn mdrsh nSibin danWGah Jndi Wapvr aialt Kvzstan sal panSd si miladi frman Ksrv anvWirvan tasis iaft zman Ebasian dvam daWt dlili rain mdEast hmCnin danWmndan dvrh aslami hmCvn abn sina zkriai razi abvriHan birvn artQai tfkr tEali mdarJ sir slvk JamEh bWri sTH Jhan WnaKth hstnd dvran Sfvih antQal aSvl Elm mEarf Jdid arvpa Agaz Wd avli mdrsh Jdid sal arvmih Agaz kar krd andiWh aiJad mrkz AmvzW Eali airan tEbir diGr danWGah nKst tasis daralfnvn sal h W rWth mhnds darvsazi Tb JraHi tvpKanh piadh nxam svarh nxam mEdn Wnasi hmt mirza tQi Kan amirkbir Emli Grdid daralfnvn GrCh tvsEh niaft tJrbh mgtnmi ksani Arzvi AWnaii airan danW Jdid piWrft arvpa SnEt aQtSad siast bvd Qrar dad ETf tJrbh sal h W dktr mHmvd Hsabi piWnhad rah andazi mrkz JamE hmh aglb danW vzir vQt frhnG dktr Eli aSgr Hkmt nhad bhmn mah sal Wmsi Jlsh hiat dvlt vQt tWkil zminh Abad thran ziba Wkvh abnih Emarat kaK ziba sKn Amd mrHvm frvgi rvz riast vzir brEhdh daWt ik sv diGr vzir svi diGr zban tHsin tmJid Whr GWvdnd brKi Jlb rXait Wah mQal Enan kf bdadnd mrHvm Eli aSgr Hkmt kfil vzart mEarf Ankh piWrft paitKt nadidh anGard lHni mHtaTanh Cnin Gft albth Abad Exmt paitKt Wki nist tnha nQS AWkar ainst anivrsth ndard Hif Whr nvin HiU diGr blad bzrG Ealm vaps mand sKn arzWmnd taUir Kvd brJai nhad drnG mQbvl hmh aftad rv tKSiS bvdJh avli mizan tvman vzart mEarf aJazh dadnd zmin mnasb tasis danWGah biabd saKtman asrE vQt pdid Avrd Eli aSgr Hkmt drnG dst kar Wd JstJv mkan iabi mnasb danWGah kmk mWavrh Andrh Gdar mEmar Cirh dst fransvi rvzGar Envan mhnds Kdmt vzart mEarf bvd Agaz krd JstJv bsiar abnih bag zmin fravan rvz aTraf thran bag Jlalih aHdaU danWGah brGzidnd hmin Hal brKlaf amrvz iaftn zmin mnasb Whr thran aiJad danWGah Exim tQriba nammkn rvz zmin fravan vJvd daWt SaHb nh tnha drfrvW amsaki ndaWtnd vaGZari Cnin mvssat mslma svd klani dnbal daWt sr dst Wkst hmin rv bvd Grvhi malkin araXi bhJt Abad svastfadh nxr vzir malih vQt Jlb krd zmin tasis danWGah Kridari nmaid Hal nxr mvsiv Gdar ErSh zmin tnG mvQEit sil Gir bvd tasis danWGah hiC mnasb nbvd hmh mrHvm davr rJHan Jlsh hiat dvlt sKti Krid araXi bhJt Abad pai fWrd nxr biWtr aEXa Jlb krdh sranJam dvlt bhJt Abad brGzidnd hmin Hal Eli aSgr Hkmt dl Wksth naamid naxr maJra bvd rXaWah vard Wd aTlaE mvXvE Qldri KaS Kvd avXaE brhm zd Gft bag Jlalih brGzinid bhJt Abad abda Waisth nist ErSh km araXi sil Gir dvlt brabr sKn QaTE zban kam kWidnd aHdi dm brniavrd bag Jlalih Wmal thran rvz Qrih amirAbad KndQ Wmal thran Qrar daWt bag ziba pvWidh drKtan khnsal mUmr gir mUmr bvd Hd Q vaps sal Hkvmt naSraldin Wah QaJar frman Wahzadh nam Jlal aldvlh bna iafth rvz malkit taJr trk nam HaJ rHim AQai atHadih tbrizi bvd Hal bag Jlalih Qrar mtri rial JmEa mblg Sd hzar tvman taJr Kridari Wd mvsiv Gdar srEt mamvr tEiin Hd nrdh GZari TraHi aJrai Emliat saKtman Wd hmin Hal panzdhm bhmn mah W lvH iadbvd tasis danWGah HXvr mQamat dvlt mHli aknvn plkan Jnvb danWkdh pzWki dl Kak amant GZaWth TraHi prdis danWGah hman mEmar fransvi Ehdh Grft nKst TrH Kiaban aTraf daKl danWGah araih krd taiid panzdhm bhmn Emliat aJrai kaWt nhal drKtan saih Gstr baWkvh Cnar knar Kiaban Agaz Wd tasis danWGah thran Agaz AWnaii Jdi airan mgrb zmin mQarn aftad danWGah bstr aSli artbaT tmdn mgrb zmin Elm Jdid tbdil krd Agaz fEalit AmvzW danWGah thran taknvn hmvarh frd Waisth WKS brJsth Chrh SaHb nam tdris tHSil prdaKth Srf nxr asami fEalan knvni ErSh siast aJtmaE Elm hnr nam tn drGZWtGan aWarh astad Jlal aldin hmaii EbdalExim Qrib bdiE alzman frvzanfr prvfsvr mHmvd Hsabi astad Eli akbr dhKda dktr mHmd mEin mhnds mhdi bazrGan Whid dktr mSTfi Cmran dktr idalh sHabi Whid dktr mHmd mftH astad Whid mrtXi mThri dktr EbdalHsin zrin kvb dktr krim saEi dktr aHmd Hami prdis danWGah thran Jnvb Kiaban anQlab Wmal Kiaban pvr sina WrQ grb trtib Kiaban Qds AZr mHdvdast sal h W msaHti vsEt hktar tasis Wd mJmvEh saKtman danWkdh hnr ziba adbiat Elm ansani Elm fni HQ Elm siasi pzWki dndanpzWki darvsazi saKtman ktabKanh mrkz mhm ktabKanh kWvr Wmar Aid msJd danWGah vaQE Wd sazman mrkz danWGah adarh amr danWJvii mrkz bhdaWt drman danWJvian danWkdh mHiT zist Jgrafia Kiaban aTraf danWGah Qrar darnd danWkdh Elm aJtmaE Elm trbiti kvi danWGah aQtSad alhiat mEarf aslami trtib amirAbad Wmal Kiaban mThri vaQE Wd hmCnankh Wmar diGr danWkdh mrkz tHQiQati pjvhWi danWGah thran birvn thran Whr Qm krJ pakdWt sari Cvka nWtarvd vaQE Wd sal h W danWkdh pzWki dndanpzWki darvsazi danWGah thran Jda Wd danWGah Elm pzWki thran tWkil dadnd amrvz danWGah thran mvssat sazman vabsth AmvzW Eali kWvr HiU nxr JaiGahi rfiE bhrh mnd vaQE mtgirhaii sabQh Qdmt tdris astad bnam blnd mrtbh tHSil danWJvian mmtaz kUrt danWJvian astad karknan arzW mdrk tHSil kWvr KarJ pivnd tEaml dstGah aJrai mvssat Wrkt SnEt adari aJrai vdaWtn ktabKanh AzmaiWGah gni mJhz tEdd rWth danWkdh mvssat pivsth vabsth vaQE Wd paitKt mrkz Whr mEiarhai tEiin aEtbar ahmit ik danWGah brWmarim Gman danWGah thran baid mEtbrtrin mhm danWGah kWvr danst Jht nist danWGah tEbir danWGah madrv nmad AmvzW Eali iadWdh"

PersianStemmer documentation built on June 28, 2019, 5:03 p.m.