fmap: Map multiple file arrays and save results

View source: R/method-map.R

fmapR Documentation

Map multiple file arrays and save results

Description

Advanced mapping function for multiple file arrays. fmap runs the mapping functions and stores the results in file arrays. fmap2 stores results in memory. This feature is experimental. There are several constraints to the input. Failure to meet these constraints may result in undefined results, or even crashes. Please read Section 'Details' carefully before using this function.

Usage

fmap(
  x,
  fun,
  .y = NULL,
  .buffer_count = NA_integer_,
  .output_size = NA_integer_,
  ...
)

fmap2(x, fun, .buffer_count = NA, .simplify = TRUE, ...)

fmap_element_wise(x, fun, .y, ..., .input_size = NA)

Arguments

x

a list of file arrays to map; each element of x must share the same dimensions.

fun

function that takes one list

.y

a file array object, used to save results

.buffer_count

number of total buffers (chunks) to run

.output_size

fun output vector length

...

other arguments passing to fun

.simplify

whether to apply simplify2array to the result

.input_size

number of elements to read from each array of x

Details

Denote the first argument of fun as input, The length of input equals the length of x. The size of each element of input is defined by .input_size, except for the last loop. For example, given dimension of each input array as 10x10x10x10, if .input_size=100, then length(input[[1]])=100. The total number of runs equals to length(x[[1]])/100. If .input_size=300, then length(input[[1]]) will be 300 except for the last run. This is because 10000 cannot be divided by 300. The element length of the last run will be 100.

The returned variable length of fun will be checked by .output_size. If the output length exceed .output_size, an error will be raised.

Please make sure that length(.y)/length(x[[1]]) equals to .output_size/.input_size.

For fmap_element_wise, the input[[1]] and output length must be the consistent.

Value

File array instance .y

Examples



set.seed(1)
x1 <- filearray_create(tempfile(), dimension = c(100,20,3))
x1[] <- rnorm(6000)
x2 <- filearray_create(tempfile(), dimension = c(100,20,3))
x2[] <- rnorm(6000)

# Add two arrays
output <- filearray_create(tempfile(), dimension = c(100,20,3))
fmap(list(x1, x2), function(input){
    input[[1]] + input[[2]]
}, output)

# check
range(output[] - (x1[] + x2[]))

output$delete()

# Calculate the maximum of x1/x2 for every 100 elements
output <- filearray_create(tempfile(), dimension = c(20,3))
fmap(list(x1, x2), function(input){
    max(input[[1]] / input[[2]])
}, output, .input_size = 100, .output_size = 1)

# check
range(output[] - apply(x1[] / x2[], c(2,3), max))

output$delete()

# A large array example
if(interactive()){
    x <- filearray_create(tempfile(), dimension = c(287, 100, 301, 4))
    dimnames(x) <- list(
        Trial = 1:287,
        Marker = 1:100,
        Time = 1:301,
        Location = 1:4
    )
    
    for(i in 1:4){
        x[,,,i] <- runif(8638700)
    }
    # Step 1:
    # for each location, trial, and marker, calibrate (baseline)
    # according to first 50 time-points
    
    output <- filearray_create(tempfile(), dimension = dim(x))
    
    # baseline-percentage change
    fmap(
        list(x), 
        function(input){
            # get locational data
            location_data <- input[[1]]
            dim(location_data) <- c(287, 100, 301)
            
            # collapse over first 50 time points for 
            # each trial, and marker
            baseline <- apply(location_data[,,1:50], c(1,2), mean)
            
            # calibrate
            calibrated <- sweep(location_data, c(1,2), baseline, 
                                FUN = function(data, bl){
                                    (data / bl - 1) * 100
                                })
            return(calibrated)
        }, 
        
        .y = output,
        
        # input dimension is 287 x 100 x 301 for each location
        .input_size = 8638700,
        
        # output dimension is 287 x 100 x 301
        .output_size = 8638700
    )
    
    # cleanup
    x$delete()
    
}

# cleanup
x1$delete()
x2$delete()
output$delete()

filearray documentation built on July 9, 2023, 5:53 p.m.