Man pages for eellpp/textutils
Utilities for text processing while building models

addNLTKStopwordsadd NLTK stopwords to the current list of stopwords
addWordsToDBAdd keywords to an sqlite database
centerScaleDatacenter and scale a numeric dataset
cleanTitleStringGet a clean string which is based on hueristic for document...
createFreqOfKeywordsTermsGet freq of keywords in dataset
createTextVectorFromDatasetcreateTextVectorFromDataset from title
getAcronymRatioget Acronym Ratio
getAUCandPlotROCGeneric method to quickly plot ROC curve and print the AUC...
getBOWfeatures_Binaryget BOW features binary - y/n
getBOWfeatures_freqget bag of words features with frew
getBOWfeaturesTestDatasetget bag of words test dataset
getBOWfeatures_Tfidfget bag of words with tfidf
getBOWKeywordsget bagofwords keywords from a dataset
getCamelCaseKeywordsGet camel case keywords from string
getCapLettersToCharactersRatioget capital letters to character ratio
getCharacterVectorGet Character Vector
getCorpusFromTextVectorget corpus from text vector
getCountByPatternget count by pattern
getDataframeFromWordVectorCreates a dataframe whose columns are words in wordvector...
getDataSetWithFeaturesget data set with features
getDigitCountGet digit count
getDocProbabilityDistributionGet the topic probability distribution for the docs
getFeaturesAboveThresholdget the features above a threshold value
getFeaturesForDatasetGiven a dataset, it returns the dataset updated with features
getFreqOfKeywordInDatasetGet freq of keywords in dataset
getFreqWordFeaturesFromTdmget freq words from tdm
getKeywordWeightsGet the suma of weights of all keywords in string
getLDATopicForDocsGet LDA top Topic terms for docs
getLDATopTopicTermsGet top terms from lda object
getModelPerformanceshow the model performance statistics like AUC, ROC,...
getPredictionsForModelget predicted results for the model built for text only...
getRatioApply a regex pattern on each word of a string and find the...
getStopWordsGet the stopwords to be used for feature building
getStopWordsCountGet stop Words Count in string
getStopWordsRatioGet the ratio of stop words
getSymbolCountGet symbol count
getSymbolToWordsRatioget Symbol to words ratio
getTestTrainDomainDatasetGet test and train dataset
getTestTrainGenericDatasetget the test train generic dataset
getTextVectorAllGet clean text from dataframe
getWordFeaturesFromTdmget the feature vector form term document matrix (tdm)
getWordFeatureVectorApply a function to create feature vector from dataset
getWordFeatureVectorWithdbhApply a function to create feature vector from dataset with...
getWordsGet Words
isDictWordChecks if the word is a dict word
recreateAndRunModelWithSelectedFeaturesGiven a model, will detect the significant features and rerun...
removeNonNumericFeaturesremove non numeric features from dataframe
runLDAGibbsRun LDA based on gibbs method
setAllBOWFeaturesInDataFrameset all the words in a dataframe
setBOWFeatureInDataFrameset bag of words in a data frame
setBOWkeywordsreturn the dataset with bag of words keywords set
eellpp/textutils documentation built on May 16, 2019, 12:12 a.m.