normalizzaTesti: Varie funzioni di normalizzazione del testo
In livioivil/TextWiller: Collection of functions for text mining, specially devoted to the italian language

Description Usage Arguments Details Value Examples

Varie funzioni di normalizzazione del testo

normalizzaTesti(
  testo,
  tolower = TRUE,
  normalizzahtml = TRUE,
  normalizzacaratteri = TRUE,
  normalizzaemote = TRUE,
  normalizzaEmoticons = TRUE,
  normalizzapunteggiatura = TRUE,
  normalizzaslang = TRUE,
  fixed = TRUE,
  perl = TRUE,
  preprocessingEncoding = TRUE,
  encoding = "UTF-8",
  sub = "",
  contaStringhe = c("\\?", "\\!", "@", "#", "(\200|euro)", "(\\$|dollar)",
    "SUPPRESSEDTEXT"),
  suppressInvalidTexts = TRUE,
  verbatim = TRUE,
  remove = TRUE,
  removeUnderscore = FALSE
)

`testo`	character vector of texts
`tolower`	`TRUE` by default
`normalizzahtml`	`TRUE` by default
`normalizzacaratteri`	`TRUE` by default
`normalizzaemote`	`TRUE` by default
`normalizzaEmoticons`	`TRUE` by default
`normalizzapunteggiatura`	`TRUE` by default
`normalizzaslang`	`TRUE` by default
`fixed`	vedi `base:gsub`. Preferibilmente non usare l'opzione.
`perl`	vedi `base:gsub`. Preferibilmente non usare l'opzione.
`preprocessingEncoding`	logical
`encoding`	`"UTF-8"` default. Se `FALSE` evita la conversione.
`sub`	character string. If not NA it is used to replace any non-convertible bytes in the input. See also parameter `sub` in function `iconv`.
`contaStringhe`	stringhe da contare nei documenti. Default: `c("\?","\!","#","@", "(€\|euro)","(\$\|dollar)","SUPPRESSEDTEXT")`
`suppressInvalidTexts`	Sostituisce con `"SUPPRESSEDTEXT"` le stringhe con mutibyte non valida (che produrrebbero verosimilmente errori nelle successive normalizzazioni). Default `TRUE`.
`verbatim`	Mostra statitiche durante il processo. Default `TRUE`
`remove`	`TRUE` by default. Possibily, a vector of stopwords to be removed.
`removeUnderscore`	rimuovere gli underscore?
`ifErrorReturnText`	what to return for tests with a wrong encoding.
`stopwords`	Lista di parole da escludere dall'analisi. A list of words to be excluded from the process. `itastopwords` by default.

itastopwords e' una lista di stopwords italiane.

Per normalizzaTesti l'output e' il vettore di testi normalizzati. La tabella dei conteggi specificati in contaStringhe e' assegnato come tabella counts tra gli attributes del vettore stesso.

Per tutte le altre funzioni, l'output e' un vector della stessa lunghezza di testo ma con testi normalizzati.

1
2
3

testoNorm <- normalizzaTesti(c('ciao bella!','www.associazionerospo.org','noooo, che grandeeeeee!!!!!','mitticooo', 'mai possibile?!?!'))
testoNorm
attr(testoNorm,"counts")

livioivil/TextWiller documentation built on Nov. 30, 2020, 3:17 a.m.

livioivil/TextWiller index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

livioivil/TextWiller
Collection of functions for text mining, specially devoted to the italian language

normalizzaTesti: Varie funzioni di normalizzazione del testo
In livioivil/TextWiller: Collection of functions for text mining, specially devoted to the italian language

Description

Usage

Arguments

Details

Value

Examples

Related to normalizzaTesti in livioivil/TextWiller...

R Package Documentation

Browse R Packages

We want your feedback!

livioivil/TextWiller Collection of functions for text mining, specially devoted to the italian language

normalizzaTesti: Varie funzioni di normalizzazione del testo In livioivil/TextWiller: Collection of functions for text mining, specially devoted to the italian language

Description

Usage

Arguments

Details

Value

Examples

Related to normalizzaTesti in livioivil/TextWiller...

R Package Documentation

Browse R Packages

We want your feedback!

livioivil/TextWiller
Collection of functions for text mining, specially devoted to the italian language

normalizzaTesti: Varie funzioni di normalizzazione del testo
In livioivil/TextWiller: Collection of functions for text mining, specially devoted to the italian language