CleanText: Clean text and build term matrix for bag of words,TF DFI and...

Description Usage Arguments Value Author(s) Examples

Description

Clean text and build term matrix for bag of words,TF DFI and bi-gram.

Usage

1
CleanText(source_dataset, dtm_method, reductionrate)

Arguments

source_dataset

A dataframe having two columns, review as text, label as binary.

dtm_method

1 for bag of word, 2 for TF DFI, 3 for bigram.

reductionrate

how many percent of term matrix you want to keep,usually 0.999 and not less than 0.99.

Value

dataframe "dataset" : The term matrix converted to dataframe plus target label.

A clean dataframe,a term-matrix

Author(s)

Zahra Khoshmanesh

Examples

1
2
3
4
5
6
7
8
## Not run: 
library("SentiAnalyzer")
direction <- system.file(package = "SentiAnalyzer", "extdata/Restaurant_Reviews.tsv")
orignal_dataset <- read.delim(direction,quote='',stringsAsFactors = FALSE)
CleanText(original_dataset,dtm_method=1,reductionrate=0.99)
CleanText(original_dataset,dtm_method=2,reductionrate=0.99)
CleanText(original_dataset,dtm_method=3,reductionrate=0.999)
## End(Not run)

zahrakhoshmanesh/FinalProjectSTAT585 documentation built on June 4, 2019, 1:57 p.m.