make_chunks: Create n chunks from a vector

make_chunksR Documentation

Create n chunks from a vector

Description

Generate start and end (from, to) points along a positive vector of length n

Usage

make_chunks(n, chunk_size, start, limit = Inf, n.fx = 1)

Arguments

n

A positive (nonzero) integer representing the total length to break into chunks

chunk_size

A positive (nonzero) integer denoting the size of each chunk

start

Optional. A positive (nonzero) integer denoting where to start; defaults to 1L

limit

Optional. A positive (nonzero) integer denoting the maxinum chunk size; no limit by default

n.fx

Optional. A positive (nonzero) numeric denoting the factor to increase n and chunk_size beyond the input limit. Defaults to 1.00 (identity). Should be very rarely needed.

Details

This function creates equally-spaced, or as equal as possible, start (from) and end (to) points. The core functionality can be recreated simply by using seq.int(from, to, by) along with seq.int(from, to, by)-(by-1). This function provides quite a bit more flexibility and error-checking.

This function handles the common scenario where the upper threshold denoted by n is important, i.e. for batched API calls. As such, the terminal chunk may very well be of a different size than previous ones. Additionally, if n < chunk_size, the latter will automatically be truncated to n, or if n.fx != 1.00 (default), towards n* n.fx.

The start argument optionally enables setting a start point that is not the default (1). This is useful if you wish to e.g make an API call starting from a specific index.

The limit argument optionally enables setting a chunk_size threshold to e.g. ensure that a chunk_size > limit is not possible. This should be set if using n.fx!

Value

A data.frame containing three variables: from, to, and size. By default, in ascending order according to to.

Note

Inputs other than n.fx should be of type integer. They need not be multiples of one another. Aside from n.fx, numerics will be coerced to integer via as.integer, and this may create unexpected, although still correct (from an integer coercion perspective) results.

In typical use, n.fx will not be used, even if it is explicitly provided; the anticipated use case for this function is to take a large n and chunk it up into pieces of size chunk_size. If and only if n is less than chunk_size will n.fx be used; this scenario is outside the scope of the most common use case. It can, however, arise when dealing with a range of potential n, where some n values might be < chunk_size. In these situations, the function will automatically adjust both n AND chunk_size by n.fx. If you do not provide an explicit n.fx, there will be no additional adjustment. If you provide an n.fx value other than 1.00 (default), there will be (upward) adjustment as requested.

Note that upward adjustment is mainly relevant if you are not sure if the value of (some of) your n values are actual upper limits, and do not wish to make any safety adjustments in your input data, in which case this function has facilities to meet this (admittedly fringe) requirement.

If you need upward adjustment, you should set the limit argument if an upper limit is important in your application, as the function will not know what the upper limit is in such cases.

Examples

make_chunks(1000L, 200L)
make_chunks(1000.999, 200) # same
make_chunks(1000.99, 200.99) # also same
make_chunks(100, 23) # note final chunk size

make_chunks(2E5, 5E4, limit = 5E4) # common Site Catalyst use case
make_chunks(2E5, 5E4) # same without limit, since limit is optional
make_chunks(2E5, 5E5, limit = 5E4) # same; limit auto-corrects
make_chunks(2E5, 5E4, limit = 5E4, start = 50001) # skip the first chunk

make_chunks(1E4, 5E4) # n < chunk_size; will auto-set chunk_size to n
make_chunks(1E4, 5E4, n.fx = 1.01) # using n.fx to raise input limit for safety
make_chunks(1E4, 5E4, n.fx = 1.01, limit = 1E3) # now n.fx not used since limit < n

## Not run: 
make_chunks(100, 1000, start = 101) # error; start < n
make_chunks(100, 1000, start = 101, n.fx = 2) # still error; cannot circumvent with n.fx!
make_chunks(10000, 1000, start = 101, n.fx = 2) # this is OK, although n.fx is not used
make_chunks(10, 10) # works, but not much point
make_chunks(10, 10, limit = 2.1) # also works, but now there is a point

## End(Not run)

slin30/wzMisc documentation built on Jan. 27, 2023, 1 a.m.