CountVectorizer: Count Vectorizer

Description Usage Format Usage Methods Examples

Description

Creates CountVectorizer Model. Given a list of text, it generates a bag of words model and returns a data frame consisting of BOW features.

Usage

1

Format

R6Class object.

Usage

For usage details see Methods, Arguments and Examples sections.

1
2
3
4
bst = CountVectorizer$new(min_df=1, max_df=1, max_features=1)
bst$fit(sentences)
bst$fit_transform(sentences)
bst$transform(sentences)

Methods

$new()

Initialise the instance of the vectorizer

$fit()

creates a memory of bag of words

$transform()

based on encodings learned in fit method, return a bag of words matrix

$fit_transform()

simultaneouly fits and transform words and returns bag of words of matrix

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
df <- data.frame(sents = c('i am alone in dark.','mother_mary a lot',
                           'alone in the dark?',
                           'many mothers in the lot....'))

# fits and transforms on the entire data in one go
bw <- CountVectorizer$new(min_df = 0.3)
tf_features <- bw$fit_transform(df$sents)

# fit on entire data and do transformation in train and test
bw <- CountVectorizer$new()
bw$fit(df$sents)
tf_features <- bw$transform(df$sents)

ssi-ashraf/superml documentation built on Nov. 5, 2019, 9:18 a.m.