Changes multiline documents to single line. Strips extra whitespace and punctuation.
Changes digits to 'X's. Non-alpha characters converted to spaces.
A tm Corpus object.
library( tm )
txt = c( "thhis s! and bonkus 4:33pm and Jan 3, 2015. ",
" big space\n dawg-ness?")
a <- clean.text( Corpus( VectorSource( txt ) ) )
Questions? Problems? Suggestions? Tweet to @rdrrHQian@mutexlabs.com.
Please suggest features or report bugs with the GitHub issue tracker.
All documentation is copyright its authors; we didn't write any of that.