Introduction to osum

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

Motivation

Before R became available, I was a heavy user of the S-PLUS™ software, a commercial declination of R's ancestor S. It had a very practical function called objects.summary(), which would list objects from an environment in a tabular form (basically as a data.frame) with some interesting attributes including class, mode, dimensions, and size. I couldn't find its equivalent in R, so I wrote one 😊

Installation

Stable version

You can install the current stable version of osum from CRAN:

install.packages("osum")

Windows and macOS binary packages are available from here.

Development version

You can install the development version of osum including latest features from GitHub:

require(remotes)
install_github("zivankaraman/osum")
library(osum)

Basic Usage

First, we need to populate the session environment with a few objects.

a <- month.name
b <- sample(c("FALSE", "TRUE"), size = 5, replace = TRUE)
cars <- mtcars
.hidden <- -1L
.secret <- "Shhht!"
x1 <- rnorm(n = 10)
x2 <- runif(n = 20)
x3 <- rbinom(n = 30, size = 10, prob = 0.5)
lst <- list(first = x1, second = x2, third = x3)
fun <- function(x) {sqrt(x)}

By default, the environment of the call to objects.summary is used, here .GlobalEnv.

objects.summary()

The hidden objects are not shown by default. One has to provide argument all.objects=TRUE to see them (not unlike the all.names argument to the ls function)

objects.summary(all.objects = TRUE)

If the objects.summary is called inside the function, it is the calling function's environment that is used by default.

# shows an empty list because inside myfunc no variables are defined
myfunc <- function() {objects.summary()}
myfunc()
# define a local variable inside myfunc
myfunc <- function() {y <- 1; objects.summary()}
myfunc()

Restricting the Objects List

We can limit the output to objects with names matching the regular expression provided as the pattern argument. Alternatively, we can provide a character vector naming objects to summarize in the names argument.

objects.summary(pattern = "^x")
objects.summary(names = c("a", "b"))

Where to Look for Objects?

We can list the objects from any environment, not just the current environment. The environment can be provided as an integer indicating the position in the search list or a character giving the name of an environment in the search list.

idx <- grep("package:graphics", search())
objects.summary(idx, pattern = "^plot")
objects.summary("package:graphics", pattern = "^plot")

We can also explicitly provide an environment.

e <- new.env()
e$a <- 1:10
e$b <- rnorm(25)
e$df <- iris
e$arr <- iris3
objects.summary(e)
rm(e, myfunc)

Unless an explicit environment is provided, where argument should designate an element of the search list. However, if it is a character of the form "package:pkg_name" and if the package named "pkg_name" is installed, it is silently loaded, its objects retrieved, and then it is unloaded when the function exits. Depending on the time it takes to load the package, the execution might be slower than getting the information about an attached package.

# check if the package foreign is attached
length(grep("package:foreign", search())) > 0L
objects.summary("package:foreign", pattern = "^write")
# check if the package foreign is attached
length(grep("package:foreign", search())) > 0L

Selecting Information to Display

We don't need to display all the attributes, the what argument controls which information is returned. Partial matching is used, so only enough initial letters of each string element are needed to guarantee unique recognition. For example, "data[.class]", "stor[age.mode]", "ext[ent]", "obj[ect.size]".

objects.summary(what = c("data.class", "storage.mode", "extent", "object.size"))
objects.summary(what = c("data", "stor", "ext", "obj"))

In fact, just providing the first letter is sufficient, since all the possible values start with a different letter. The order of columns in the summary respects the order in which their names are listed in the what argument.

objects.summary(what = c("m", "s", "t", "o", "d", "e"))

It should be noted that attributes storage.mode, mode, and typeof are somewhat redundant, so you can select only those that are relevant to you. You can set your personal preferences using the osum.options function, as explained in [Options].

Filtering Objects

The subset of objects from the environment where which should be selected for summary is specified with either an explicit vector of names provided in argument names, or with some combination of the subsetting criteria pattern (as seen in [Restricting the Objects List]), data.class, storage.mode, mode, and typeof. If argument names is given, the other criteria are ignored. If more than one criterion is given, only objects which satisfy all of them are selected. In the absence of both names and criteria, all objects in where are selected.

objects.summary("package:datasets", pattern = "^[sU]", what = c("dat", "typ", "ext", "obj"),
                data.class = c("data.frame", "matrix"))

Objects can have more than one class, but only the first class element is used by default. Specifying all.classes=TRUE allows to consider the entire class vector of an object, both in selection based on argument data.class and in the returned summary.

objects.summary("package:datasets", what = c("dat", "typ", "ext", "obj"), data.class = "array")
objects.summary("package:datasets", what = c("dat", "typ", "ext", "obj"), 
                all.classes = TRUE, data.class = "array")

Besides simple filtering criteria by values of attributes, we can also filter on logical expression indicating elements (rows) to keep. The expression is evaluated in the data frame with object attributes, so columns should be referred to (by unquoted attribute name) as variables in the expression (not unlike the select argument of the base subset function). This can be particularly helpful when we want to exclude some values, avoiding explicit listing of all other (possible) values, as shown in the example below.

objects.summary("package:grDevices", filter = mode != "function")

The filter expression can involve more than one attribute.

objects.summary("package:datasets", filter = mode != storage.mode)[1:10, ]

It can also be quite complex, as long as it yields a logical value for every object (row).

objects.summary("package:datasets", all.classes = TRUE, 
                filter = sapply(data.class, length) > 2L)

Sorting Objects

By default, the object entries (printed as rows) in the summary are sorted alphabetically by object name. By providing the order argument, they can be sorted on any other column(s). The order argument should be (unquoted) column names. For numeric columns, one can precede the name by "-" to sort in descending order, with the expression enclosed in parentheses (see examples). To sort on more than one column, the expression must be provided as a vector c(., .) (again see examples). Feature inspired by the standard R order function.

# filter on 'mode' and sort on 'data.class'
objects.summary("package:datasets", what = c("dat", "typ", "ext", "obj"), mode = "numeric", 
                order = data.class)[1:10, ]
# filter on 'mode' and sort (descending) on 'object.size'
objects.summary("package:datasets", what = c("dat", "typ", "ext", "obj"), mode = "numeric", 
                order = (-object.size))[1:10, ]
objects.summary("package:datasets", what = c("dat", "typ", "ext", "obj"),  
                order = c(data.class, -object.size))[1:10, ]

It should be noted that although the extent is by default printed (by the specific print method for objects of class objects.summary) as a product of dimensions (d1 x d2), it is internally stored as a list, which allows sorting on a number of rows or columns, for example.

# get all two-dimensional objects of from the datasets package, with more than 7 columns, 
# sorted by number on columns (ascending) and then on number of rows (descending) 
objects.summary("package:datasets", what = c("dat", "typ", "ext", "obj"), 
                filter = sapply(extent, length) == 2L & sapply(extent, "[", 2L) > 7L,
                order = c(sapply(extent, "[", 2L), -sapply(extent, "[", 1L)))

The entries are sorted in ascending order by default. They can be sorted in descending order by specifying reverse=TRUE.

# get five biggest objects from package datasets
objects.summary("package:datasets", what = c("dat", "typ", "ext", "obj"), 
                reverse=TRUE)[1:10, ]

It should be noted that the objects in the summary can be filtered and/or sorted by the columns that will not be part of the summary (i.e. are not listed in the what argument).

objects.summary("package:datasets", what = c("dat", "typ", "ext"), pattern = "st", 
                filter = mode %in% c("list", "numeric"), order = object.size)

Printing and Summarizing

The objects.summary function creates an object of class objects.summary, which is an extension of the data.frame class. The purpose of this class is being able to propose custom print and summary methods.

The number of rows printed can be limited by the max.rows argument, which allows more straightforward control than the max argument of the print.data.frame.

When all.classes argument is set to TRUE, the entire class vector is returned, and the data.class column is a list of character vectors. When such data is printed, the output is limited to a fixed number of characters (12 by default), longer strings being shown as e.g. "matrix, ..." or "nfnGroup....". The data.class.width argument to the print method allows users to change this value (probably to increase it), in order to see (almost) all the classes.

os <- objects.summary("package:datasets", what = c("dat", "ext", "obj"), 
                      all.classes = TRUE, order = object.size, reverse = TRUE)
print(os, data.class.width = 25, max.rows = 12)

multi_class_objects <- row.names(objects.summary("package:datasets", all.classes = TRUE, 
                                                 filter =  sapply(data.class, length) > 1L))
os <- objects.summary("package:datasets", names = multi_class_objects, all.classes = TRUE, 
                      what = c("dat", "ext", "obj"))
print(os, data.class.width = 32, max.rows = 12)

As already mentioned in [Sorting Objects], the extent column is internally stored as a list, and we can explicitly control how it is printed by the format.extent argument.

multi_dim_objects <- row.names(objects.summary("package:datasets", all.classes = TRUE, 
                                               data.class = c("array", "table")))
os <- objects.summary("package:datasets", names = multi_dim_objects, 
                      what = c("dat", "ext", "obj"))
print(os[rev(order(sapply(os$extent, length))), ], 
      format.extent = TRUE, max.rows = 12) # default
print(os[rev(order(sapply(os$extent, length))), ], 
      format.extent = FALSE, max.rows = 12)

Other options can be passed down to the print.data.frame function (not necessarily very useful).

print(objects.summary("package:datasets", what = c("dat", "typ", "ext", "obj")), 
      format.extent = TRUE, max.rows = 12, right = FALSE, quote = TRUE)

The summary method shares the same specific arguments as the print except for max.rows.

os <- objects.summary("package:datasets", all.classes = TRUE, what = c("dat", "ext", "obj"),
                      filter = sapply(data.class, length) > 1L)
summary(os, data.class.width = 32, format.extent = FALSE)

Again, other options can be passed down to the summary.data.frame function.

summary(os, data.class.width = 32, maxsum = 10, quantile.type = 5)

Options

There are a few custom options dedicated to the package. The function osum.options, crafted after the base package options, allows the user to set and examine them. The custom options mainly allow for providing the default values for the specific arguments to the print and summary methods (data.class.width, format.extent, and max.rows), as seen in [Printing and Summarizing].

# see all current options
osum.options()
# set some values
old_opt <- osum.options(osum.data.class.width = 12, osum.max.rows = 25)
# previous values of the changed 'osum' options
old_opt

It is also possible to select what information will be returned by default by the function objects.summary. It must be a subset of c("data.class", "storage.mode", "mode", "typeof", "extent", "object.size"), partial matching is allowed.

# set which attributes are retrieved by default
osum.options(osum.information = c("dat", "mod", "ext", "obj"))
# get the current value of the option
osum.options("osum.information")
# if the argument 'what' is not specified, the new default values are used
objects.summary("package:base", filter = data.class != "function")





*Created on `r format(Sys.Date(), "%Y-%m-%d")`.*



Try the osum package in your browser

Any scripts or data that you put into this service are public.

osum documentation built on Sept. 11, 2024, 5:58 p.m.