noise: detect noise

Description Usage Arguments Value

Description

detect noise

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
noise(.Object, ...)

## S4 method for signature 'DocumentTermMatrix'
noise(.Object, minTotal = 2,
  minTfIdfMean = 0.005, sparse = 0.995, stopwordsLanguage = "german",
  minNchar = 2, specialChars = getOption("polmineR.specialChars"),
  numbers = "^[0-9\\.,]+$", verbose = TRUE)

## S4 method for signature 'TermDocumentMatrix'
noise(.Object, ...)

## S4 method for signature 'character'
noise(.Object, stopwordsLanguage = "german",
  minNchar = 2, specialChars = getOption("polmineR.specialChars"),
  numbers = "^[0-9\\.,]+$", verbose = TRUE)

## S4 method for signature 'textstat'
noise(.Object, pAttribute, ...)

Arguments

.Object

an .Object of class "DocumentTermMatrix"

...

further parameters

minTotal

minimum colsum (for DocumentTermMatrix) to qualify a term as non-noise

minTfIdfMean

minimum mean value for tf-idf to qualify a term as non-noise

sparse

will be passed into "removeSparseTerms" from "tm"-package

stopwordsLanguage

e.g. "german", to get stopwords defined in the tm package

minNchar

min char length ti qualify a term as non-noise

specialChars

special characters to drop

numbers

regex, to drop numbers

verbose

logical

pAttribute

relevant if applied to a textstat object

Value

a list


nrauscher/corpus documentation built on May 23, 2019, 9:34 p.m.