Extractor2: Extractor2

Description Usage Arguments Examples

Description

This is the alternative extractor for the Endoscopy and Histology report. THis performs the same essentially as the main extractor but is useful when the semi-structured text is organised in a non-standard way ie the delimiting text is not always in the same order As per the main Extractor, This function on the user creating a list of words or characters that act as the words that should be split against. The list is then fed to the Extractor in a loop so that it acts as the beginning and the end of the regex used to split the text. Whatever has been specified in the list is used as a column header. Column headers don't tolerate special characters like : or ? and / and don't allow numbers as the start character so these have to be dealt with in the text before processing

Usage

1
Extractor2(x, y, stra, strb, t)

Arguments

x

the dataframe

y

the column to extract from

stra

the start of the boundary to extract

strb

the end of the boundary to extract

t

the column name to create

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
v<-TheOGDReportFinal
Myendo<-TheOGDReportFinal
Myendo$OGDReportWhole<-gsub('2nd Endoscopist:','Second endoscopist:',
Myendo$OGDReportWhole)

EndoscTree<-list('Hospital Number:','Patient Name:','General Practitioner:',
'Date of procedure:','Endoscopist:','Second Endoscopist:','Medications',
'Instrument','Extent of Exam:','Indications:','Procedure Performed:',
'Findings:','Endoscopic Diagnosis:')

for(i in 1:(length(EndoscTree)-1)) {
 Myendo<-Extractor2(Myendo,'OGDReportWhole',as.character(EndoscTree[i]),
 as.character(EndoscTree[i+1]),as.character(EndoscTree[i]))
}
res<-Myendo

sebastiz/EndoMineR_devlop documentation built on May 29, 2019, 7:33 a.m.