knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
rray provides a new array class that changes some of the behavior with base R arrays that I think could be altered to result in code that is more predictable and easy to program around. In the same spirit as tibble, rrays do less than base R arrays to try and work as consistently and intuitively as possible.
Base R subsets matrices and arrays with a default of
drop = TRUE, similar to data frames. This is one of the most common sources of bugs, especially when programming around arrays, because a lot of mental work is required to ensure that functions don't accidentally subset matrices in a way that coerces them to vectors.
x_mat <- matrix(1:6, ncol = 2) x_mat x_mat[1:2,] x_mat[1L,] x_mat[,1:2] x_mat[,1L]
This example demonstrates that
[ is not type stable. This means it is impossible to predict the type of the output solely based on the input's type. Type is defined in the vctrs sense (read about it here), but for usage in rray think of the type as being the class of an object along with its attributes, and the shape of the object, which is the dimensionality of the array excluding the first dimension. The first dimension is handled separately as the size.
To give a concrete example,
x_mat has a type of
1L have types of
<integer>. So the full subset expression looks like:
[ was type stable, you would be able to predict the output's type based on these inputs. But you can't! In some cases, this returns
<integer[,2]>, and in others it returns
Looking past the type instability, the fix for the dropping of dimensions is usually to use
drop = FALSE.
x_mat[1, , drop = FALSE]
But this requires careful thinking about how to subset arrays programmatically, adding the right number of commas where required to ensure that dimensions aren't dropped. Consider how you would pull the first row of a 3D vs 4D array, and pay attention to the varying number of commas required to prevent a vector from being returned.
x_3D <- array(1:12, c(2, 3, 2)) x_4D <- array(1:12, c(2, 3, 2, 1)) x_3D[1, , , drop = FALSE] x_4D[1, , , , drop = FALSE]
rray takes a different approach, and never drops dimensions when subsetting. This results in a type stable
[ method, and actually frees up subsetting syntax that I find more intuitive, but results in an error with base R. To convert
x_3D to an rray, use
x_3D_rray <- as_rray(x_3D) # First row, but still 3D x_3D_rray # Trailing commas are ignored, so this is the same as above x_3D_rray[1,] # In base R this is an error x_3D[1,]
This type stability makes rrays more predictable to program around. Whether subsetting one element or multiple elements from a specific dimension, you can know that the output will have a consistent type.
x_3D_rray[, 1L] x_3D_rray[, 1:2]
If you are writing a package where you aren't sure if the user is going to pass in an rray or an array, you'll want to be sure that you can subset consistently, no matter the input type. For that,
rray_subset() has been exposed, which is what powers
[. This means you can have this type stable subsetting with base R objects.
x_3D[1, 1] rray_subset(x_3D, 1, 1) # Same as with the rray rray_subset(x_3D_rray, 1, 1)
Base R allows you to access "inner" elements of an array by using
[ without any commas, i.e.
x[i]. While technically this one operation on its own is type stable, when you compare it to
x[,i], it becomes clear that it is doing something completely different.
# Positions 1 and 2 x_3D[1:2] # Positions 2 and 3 # Correspond to elements at indices: # (2, 1, 1) and (1, 2, 1) x_3D[2:3] rray_dim(x_3D[1:2]) rray_dim(x_3D[1:2,,]) rray_dim(x_3D[,1:2,]) rray_dim(x_3D[,,1:2])
I think that this is a completely separate operation, called subsetting by position. The traditional behavior of
x[i, j, ...] is called subsetting by index.
As seen briefly in the subsetting section, rrays never drop dimensions, and ignore trailing commas when subsetting. This means that
x[i] selects rows from
x, not positions.
x_3D_rray # Same as this x_3D_rray[2,]
This behavior results in a nice symmetrical predictable behavior that is easy to see if you draw out the different operations. Just by adding commas, we go from selecting rows, to columns, to elements in the third dimension. The type is stable the entire time.
x[i] x[,i] x[,,i]
Because trailing commas are ignored, the following are also equivalent to the above explanation.
x[i,] x[,i,] x[,,i,]
But what about selecting by position? This is still a useful operation, so it would make sense to have some way to do it. For that, you can equivalently use either
rray_yank() always returns a 1D vector, and allows you to pass in an integer vector to subset by position.
# First 3 positions x_3D_rray[[1:3]] rray_yank(x_3D_rray, 1:3)
There are assignment variations of these as well, which are very useful for replacing
NA values if you combine it with the other type of object that
i can be, a logical with the same dimensions as
TRUE values are interpreted as the positions to subset.
dummy <- x_3D_rray # Set the first row to NA dummy <- NA # Set NA values to 0 dummy[[is.na(dummy)]] <- 0 dummy
The 3rd arm of subsetting is when you want to subset by index like
rray_subset(), but you want to actually drop the dimensions like
rray_yank(). This is accomplished with
rray_subset(x_3D_rray, 1) rray_extract(x_3D_rray, 1)
Base R has consistent dimension name handling rules, but they can be surprising because they often don't retain the maximum amount of information that it seems like they could.
x_row_nms <- matrix(1:2, dimnames = list(c("r1", "r2"))) x_col_nms <- matrix(1:2, dimnames = list(NULL, c("c1"))) x_row_nms x_col_nms # names from x_row_nms are used x_row_nms + x_col_nms # names from x_col_nms are used x_col_nms + x_row_nms
It is reasonable that dimension names from both inputs could be used here.
rray has a different set of dimension name reconciliation rules that attempts to pull names from all inputs.
x_row_nms_rray <- as_rray(x_row_nms) x_col_nms_rray <- as_rray(x_col_nms) x_row_nms_rray + x_col_nms_rray x_col_nms_rray + x_row_nms_rray
The order of the inputs still matters. If both inputs have row names, for example, the row names of the result come from the first input.
x_col_and_row_nms_rray <- rray_set_row_names(x_col_nms_rray, c("row1", "row2")) x_col_and_row_nms_rray x_row_nms_rray + x_col_and_row_nms_rray x_col_and_row_nms_rray + x_row_nms_rray
This approach emphasizes maintaining the maximum amount of information possible. The underlying engine for dimension name handling is
rray_dim_names_common(). Pass it multiple inputs and it will return the common dimension names using rray's rules.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.