JSTOR_dtmofnouns: Make a Document Term Matrix containing only nouns
In benmarwick/JSTORr: Simple Text Mining and Document Clustering of JSTOR Journal Articles

Description Usage Arguments Value Examples

This function does part-of-speech tagging and removes all parts of speech that are not non-name nouns. It also removes punctuation, numbers, words with less than three characters, stopwords and unusual characters (characters not in ISO-8859-1, ie non-latin1-ASCII). For use with JSTOR's Data for Research datasets (http://dfr.jstor.org/). This function uses the stoplist in the tm package. The location of tm's English stopwords list can be found by entering this at the R prompt: paste0(.libPaths()[1], "/tm/stopwords/english.dat") Note that the part-of-speech tagging can result in the removal of words of interest. Currently I'm not sure how to keep those words.

1	JSTOR_dtmofnouns(unpack1grams, word = NULL, sparse = 1, POStag = TRUE)

`unpack1grams`	object returned by the function JSTOR_unpack1grams.
`word`	Optional word or vector of words to subset the documents by, ie. make a document term matrix containing only documents in which this word (or words) appears at least once.
`sparse`	A numeric for the maximal allowed sparsity, default is one (ie. no sparsing applied). Removes sparse terms from a term-document matrix, see help(removeSparseTerms) for more details. Values close to 1 result in a sparse matrix, values close to zero result in a dense matrix. It may be useful to reduce sparseness if the matrix is too big to manipulate in memory or if processing times are long.
`POStag`	logical Do part-of-speech tagging to identify and remove non-nouns. Default is True, but the option is here to speed things up when working interactively with large numbers of documents.

Returns a Document Term Matrix containing documents, ready for more advanced text mining and topic modelling.

1	## nouns <- JSTOR_dtmofnouns(unpack1grams)

benmarwick/JSTORr documentation built on May 12, 2019, 12:59 p.m.

benmarwick/JSTORr index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

benmarwick/JSTORr
Simple Text Mining and Document Clustering of JSTOR Journal Articles

JSTOR_dtmofnouns: Make a Document Term Matrix containing only nouns
In benmarwick/JSTORr: Simple Text Mining and Document Clustering of JSTOR Journal Articles

Description

Usage

Arguments

Value

Examples

Related to JSTOR_dtmofnouns in benmarwick/JSTORr...

R Package Documentation

Browse R Packages

We want your feedback!

benmarwick/JSTORr Simple Text Mining and Document Clustering of JSTOR Journal Articles

JSTOR_dtmofnouns: Make a Document Term Matrix containing only nouns In benmarwick/JSTORr: Simple Text Mining and Document Clustering of JSTOR Journal Articles

Description

Usage

Arguments

Value

Examples

Related to JSTOR_dtmofnouns in benmarwick/JSTORr...

R Package Documentation

Browse R Packages

We want your feedback!

benmarwick/JSTORr
Simple Text Mining and Document Clustering of JSTOR Journal Articles

JSTOR_dtmofnouns: Make a Document Term Matrix containing only nouns
In benmarwick/JSTORr: Simple Text Mining and Document Clustering of JSTOR Journal Articles