ifiles: Creates iterator over text files from the disk
In text2vec: Modern Text Mining Framework for R

ifiles

R Documentation

Creates iterator over text files from the disk

Description

The result of this function usually used in an itoken function.

Usage

ifiles(file_paths, reader = readLines)

idir(path, reader = readLines)

ifiles_parallel(file_paths, reader = readLines, ...)

Arguments

`file_paths`	`character` paths of input files
`reader`	`function` which will perform reading of text files from disk, which should take a path as its first argument. `reader()` function should return named character vector: elements of vector = documents, names of the elements = document ids which will be used in DTM construction. If user doesn't provide named character vector, document ids will be generated as file_name + line_number (assuming that each line is a document).
`path`	`character` path of directory. All files in the directory will be read.
`...`	other arguments (not used at the moment)

Examples

## Not run: 
current_dir_files = list.files(path = ".", full.names = TRUE)
files_iterator = ifiles(current_dir_files)
parallel_files_iterator = ifiles_parallel(current_dir_files, n_chunks = 4)
it = itoken_parallel(parallel_files_iterator)
dtm = create_dtm(it, hash_vectorizer(2**16), type = 'TsparseMatrix')

## End(Not run)
dir_files_iterator = idir(path = ".")

text2vec documentation built on Nov. 9, 2023, 9:07 a.m.