generate_blocked_document_term_vectors: A function to generate and save blocks of document term...
In matthewjdenny/SpeedReader: High Performance Text Analysis

View source: R/generate_blocked_document_term_vectors.R

A function to generate and save blocks of document term vectors to coherently named files from a variety of inputs.

generate_blocked_document_term_vectors(input, output_stem, data_directory,
  output_directory = NULL, block_size = 100, data_type = c("string",
  "term vector", "raw text", "csv", "ngrams"), ngram_type = NULL,
  tokenization_method = c("RegEx"), csv_separator = ",",
  csv_word_column = NULL, csv_count_column = NULL, csv_header = FALSE,
  keep_sequence = FALSE)

`input`	A list of strings, term vectors, raw documents, or csv files you wish to turn into document term vectors.
`output_stem`	The the stem of the file name we wish to give each block of output document term vector list objects generated by this function.
`data_directory`	Argument specifying where the data is stored.
`output_directory`	Optional directory to store blocked document term vector output.
`block_size`	THe number of documents to group together in a ingle block of text to save. Defaults to 100.
`data_type`	The type of data provided to the function.
`ngram_type`	The type of ngram we wish to use to generate document term vectors. Can be one of ngrams "jk_filtered", "verb_filtered", "phrases", or any of "x_grams" where x is a number specifying the n_gram length. Can only be used with input generated by the ngrams() function.
`tokenization_method`	Currently not available.
`csv_separator`	Defaults to "," but can be set to "backslasht" for tab separated values.
`csv_word_column`	If you are providing one csv file per document, then you must specify the index of the column that contains the words. Defaults to NULL.
`csv_count_column`	For memory efficiency, you may want to store only the counts of unique words in csv files. If your data include counts, then you must specify the index of the column that contains the counts. Defaults to NULL.
`csv_header`	Logical indicating whether the csv files provided have a header. Defaults to FALSE.
`keep_sequence`	Logical indicating whether document term vectors should be condensed and counts (FALSE) or whether the full sequence should be maintained for storage (TRUE). Defaults to FALSE as this can be a much more memory efficient representation.

Saves blocks of text to file.

matthewjdenny/SpeedReader documentation built on March 25, 2020, 5:32 p.m.

matthewjdenny/SpeedReader index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

matthewjdenny/SpeedReader
High Performance Text Analysis

generate_blocked_document_term_vectors: A function to generate and save blocks of document term...
In matthewjdenny/SpeedReader: High Performance Text Analysis

Description

Usage

Arguments

Value

Related to generate_blocked_document_term_vectors in matthewjdenny/SpeedReader...

R Package Documentation

Browse R Packages

We want your feedback!

matthewjdenny/SpeedReader High Performance Text Analysis

generate_blocked_document_term_vectors: A function to generate and save blocks of document term... In matthewjdenny/SpeedReader: High Performance Text Analysis

Description

Usage

Arguments

Value

Related to generate_blocked_document_term_vectors in matthewjdenny/SpeedReader...

R Package Documentation

Browse R Packages

We want your feedback!

matthewjdenny/SpeedReader
High Performance Text Analysis

generate_blocked_document_term_vectors: A function to generate and save blocks of document term...
In matthewjdenny/SpeedReader: High Performance Text Analysis