EndoPOS: Parts of speech tagging for reports

Description Usage Arguments Examples

Description

This uses udpipe to tag the text. It then compresses all of the text so you have continuous POS tagging or the whole text. The udpipe package has to be pre downloaded to run this.

Usage

1
EndoPOS(inputString)

Arguments

inputString

The input string vector

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
library(udpipe)

#Just some quick cleaning up- this will be done in the actual data set eventually 
#Myendo$OGDReportWhole<-gsub("\\.\\s+\\,"," ",Myendo$OGDReportWhole)
#Myendo$OGDReportWhole<-gsub("^\\s+\\,"," ",Myendo$OGDReportWhole)
#Myendo$RowIndex<-as.numeric(rownames(Myendo))

#We will only use the first 100 
#Myendo2<-head(Myendo,100)

#Run the function
#MyPOSframe<-EndoPOS(Myendo2$OGDReportWhole) #returns a dataframe

#Then merge the MyPOSframe with the original by row number.
#Myendo$RowIndex<-as.numeric(rownames(Myendo))
#Get the whole merged dataset with all the POS tags and morphological
#and all the dependecies.
#MergedUp<-merge(Myendo2,MyPOSframe,by.x="RowIndex",by.y="doc_id")

sebastiz/EndoMineR_devlop documentation built on May 29, 2019, 7:33 a.m.