ClearText: Text Cleaning: Custom Method

Description Usage Arguments Value Author(s) See Also Examples

View source: R/ClearText.R

Description

Cleans text and introduce custom stopwords to remove unwanted words from given data.

Usage

1
ClearText(Text, CustomList = c(""))

Arguments

Text

A String or Character vector, user-defined.

CustomList

A Character vector (Optional), user-defined vector to introduce stopwords ("english") in Text.

Value

Returns Character

Author(s)

Vatsal Aima, vaima75@hotmail.com

See Also

TOI_News_Articles, TOI_News_Dataset

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
################### Methodology #####################
###### For DataFrame ######
#### Creates Dataset based on keysword

NewsData = TOI_News_Articles("Goibibo")

## Identify any potential factor columns
vc = sapply(NewsData, is.factor)

## Convert factors to characters
NewsData[vc] = lapply(NewsData[vc], as.character)

## Clean text on specific character columns
for (i in 1:nrow(NewsData)) NewsData$News[i] = ClearText(NewsData$News[i])

######## For Character Variable #### Ex2 ####

para = "Moreover, the text data we get is noisy. But, if we can learn some
methods useful to extract important features from the noisy data, wouldn't
scandal that be amazing ? In this tuto23rial, you'll saadc@ruby.com
learn #world all ab33out regu12lar expressions from scratch. At first, 32324
detective you might find these confusing, or complicated, but after
https://anaconda.com/anaconda-enters-new-chapter/ expressions tricky,
scooby-doo doing practical hands-on exercises (done below)
you should feel bcc: @MikeQuindazzi quite comfortable with it.
In addition, we'll also cartoon-network learn about string 121manipulation
functions in R. This formidable combination of #DL #4IR #Robots
#ArtificialIntelligence string manipulation functions and regular
expressions will prepare you for text mining."

clearpara = ClearText(para,
                       CustomList = c("scooby-doo",
                                      "cartoon-network",
                                       "detective",
                                       "scandal"))
########### For List #############

paraList = list(para, 1213, factor('aasd;kasdioasd'))
paraList = lapply(paraList, as.character)
for (x in 1:length(paraList)) paraList[[x]] = ClearText(paraList[[x]])

MediaNews documentation built on Nov. 26, 2020, 5:09 p.m.