chunk_table_get_nrow: Get number of rows of a table on disk.

Description Usage Arguments Examples

View source: R/utils.R

Description

Auxiliary function to get the number of rows of a file as fast as possible. The implementation of chunk_table_get_nrow follows closely the fastest pure R solution suggested in a discussion at Stack Overflow.

Usage

1

Arguments

filename

Name of a file (full path).

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
library(data.table)

# First, generate a chunk_table file
reuters_chunk_table <- file.path(tempdir(), "reuters_chunk_table.tsv")
reuters_txt <- readLines(system.file(package = "bignlp", "extdata", "txt", "reuters.txt"))
reuters_dt <- data.table(doc_id = 1L:length(reuters_txt), text = reuters_txt)
data.table::fwrite(x = reuters_dt, file = reuters_chunk_table)

# Get nrow of the file. Note that the file includes colnames, so there is
# one row in addition to the nrow of the original chunk_table
n <- chunk_table_get_nrow(reuters_chunk_table)

PolMine/bignlp documentation built on Jan. 29, 2021, 1:14 a.m.