textPrep: textPrep function

Description Usage Arguments Value Examples

Description

This function prepares the data by cleaning punctuation, checking spelling against the lexicons, mapping terms accorsing to the lexicons, removing negative expressions and lower casing everything. It contains several of the other functions in the package for ease of use. The user can decide whether to also include POS tagging and Negative removal as well as which extractor. By default the extractor called 'Extractor' (which assumes all headers are present in the same order in each text entry) is used. Also by default the negative phrases are removed and POS tagging is not performed.

Usage

1
2
textPrep(inputText, delim, NegEx = c("TRUE", "FALSE"),
  Extractor = c("1", "2"), ExtractPOS = c("1", "2"))

Arguments

inputText

The relevant pathology text column

delim

the delimitors so the extractor can be used

NegEx

parameter to say whether the NegativeRemove function used.

Extractor

this states which Extractor you want to use. 1 is Extractor 1 (for uniformly ordered headers), 2 is Extractor2 for text when headers are sometimes missing

Value

This returns a string vector.

Examples

1
2
3
4
mywords<-c("Hospital Number","Patient Name:","DOB:","General Practitioner:",
"Date received:","Clinical Details:","Macroscopic description:",
"Histology:","Diagnosis:")
CleanResults<-textPrep(PathDataFrameFinal$PathReportWhole,mywords,NegEx="TRUE",Extractor="1",ExtractPOS="2")

sebastiz/EndoMineR_devlop documentation built on May 29, 2019, 7:33 a.m.