make_file_index: Index a file for faster access to parts of the file

View source: R/file_index.R

make_file_indexR Documentation

Index a file for faster access to parts of the file

Description

Index a file for faster access to parts of the file

Usage

make_file_index(
  pathname,
  offset = 0,
  skip = 0L,
  n_max = Inf,
  newline = "\n",
  drop_eof = TRUE,
  bfr_size = 5e+07
)

save_file_index(index, file)

read_file_index(file)

Arguments

pathname

(character) The file to be indexed.

offset

(numeric) The number of bytes to skip before start indexing.

skip

(numeric) The number of newline matches to ignore before recording them.

n_max

(numeric) The maximum number of bytes to scan.

newline

(character) The character to scan for.

drop_eof

(logical) If TRUE, the last identified byte offset is dropped if at the very end of the file, i.e. when there is nothing available to read from that position.

bfr_size

(numeric) The number of bytes to read in each iteration.

index

(numeric vector) A sorted index of file byte positions.

file

A pathname to a ‘*.index’ file to be created or read from.

Value

A numeric vector of file byte offsets that corresponds to the beginning of a line, i.e. a position in the file that was preceeded by a newline character. The first line is at file byte offset 0, which is also always the first element in the returned vector.

Examples

## An SGE accounting file
pathname <- system.file("exdata", "accounting", package = "wyntonquery")

## The corresponding SGE accounting index file
pathname_index <- sprintf("%s.index", pathname)

## Scan SGE accounting file to identify job offset positions
index <- make_file_index(pathname)
cat(sprintf("Number of jobs: %d\n", length(index)))
str(index)

## Save index to file
tf <- tempfile(fileext = ".index")
save_file_index(index, file = tf)
cat(sprintf("Saved index file: %s (%d bytes)\n", pathname, file.size(tf)))

## Read index from file
index <- read_file_index(tf)
cat(sprintf("Number of jobs: %d\n", length(index)))
str(index)

## Read jobs 301 to 350
jobs <- read_sge_accounting(pathname, offset = index[301], n_max = 50L)
print(jobs)

## Read all jobs *after* the 500:th job
jobs <- read_sge_accounting(pathname, offset = index[501])
print(jobs)

## Cleanup
file.remove(tf)

ucsf-wynton/wyntonquery documentation built on May 15, 2024, 6:23 a.m.