make_file_index: Index a file for faster access to parts of the file
In ucsf-wynton/wyntonquery: Query the UCSF Wynton Environment

make_file_index

R Documentation

Index a file for faster access to parts of the file

Description

Index a file for faster access to parts of the file

Usage

make_file_index(
  pathname,
  offset = NULL,
  skip = 0L,
  index = NULL,
  n_max = Inf,
  newline = "\n",
  drop_eof = TRUE,
  bfr_size = 5e+07
)

save_file_index(index, file)

read_file_index(file)

Arguments

`pathname`	(character) The file to be indexed.
`offset`	(numeric) The number of bytes to skip before start indexing.
`skip`	(numeric) The number of `newline` matches to ignore before recording them.
`index`	(numeric vector) A sorted index of file byte positions.
`n_max`	(numeric) The maximum number of bytes to scan.
`newline`	(character) The character to scan for.
`drop_eof`	(logical) If TRUE, the last identified byte offset is dropped if at the very end of the file, i.e. when there is nothing available to read from that position.
`bfr_size`	(numeric) The number of bytes to read in each iteration.
`file`	A pathname to a ‘*.index’ file to be created or read from.

Value

A numeric vector of file byte offsets that corresponds to the beginning of a line, i.e. a position in the file that was preceeded by a newline character. The first line is at file byte offset 0, which is also always the first element in the returned vector.

Examples

## An SGE accounting file
pathname <- system.file("exdata", "accounting", package = "wyntonquery")

## The corresponding SGE accounting index file
pathname_index <- sprintf("%s.index", pathname)

## Scan SGE accounting file to identify job offset positions
index <- make_file_index(pathname)
cat(sprintf("Number of jobs: %d\n", length(index)))
str(index)

## Save index to file
tf <- tempfile(fileext = ".index")
save_file_index(index, file = tf)
cat(sprintf("Saved index file: %s (%d bytes)\n", pathname, file.size(tf)))

## Read index from file
index <- read_file_index(tf)
cat(sprintf("Number of jobs: %d\n", length(index)))
str(index)

## Read jobs 301 to 350
jobs <- read_sge_accounting(pathname, offset = index[301], n_max = 50L)
print(jobs)

## Read all jobs *after* the 500:th job
jobs <- read_sge_accounting(pathname, offset = index[501])
print(jobs)

## Cleanup
file.remove(tf)

ucsf-wynton/wyntonquery documentation built on July 16, 2025, 7:09 p.m.