lumos: Shed Light on Your Data

View source: R/lumos.R

lumosR Documentation

Shed Light on Your Data

Description

Like a magic wand for exploring a data.frame. It's so powerful that it can be abbreviated to the single letter l.

Usage

lumos(
  data = NULL,
  ...,
  .drop = TRUE,
  .max = 20,
  .pct = TRUE,
  .order.by.freq = .pct,
  .blanks = TRUE,
  .recycle = TRUE,
  .missing = FALSE,
  .gen = FALSE,
  .kable = TRUE,
  .graphical = FALSE
)

l(
  data = NULL,
  ...,
  .drop = TRUE,
  .max = 20,
  .pct = TRUE,
  .order.by.freq = .pct,
  .blanks = TRUE,
  .recycle = TRUE,
  .missing = FALSE,
  .gen = FALSE,
  .kable = TRUE,
  .graphical = FALSE
)

lumos_plot(...)

ll(...)

Arguments

data

A data.frame (or, and object that can be converted to one by call link{as.data.frame}).

...

Further argument, evaluated within data (so they can directly refer to columns in data without quotes).

.drop

If TRUE, unused factor levels are dropped.

.max

For a single categorical variable, the maximum number of unique categories to show (can be Inf). For a single numeric variable, if there are no more than this many unique values, the variable will be treated as categorical.

.pct

If TRUE, show percents along with counts for single categorical variables.

.order.by.freq

If TRUE, for a single categorical variable, show the categories in decreasing order of frequency, from top to bottom (i.e. show the most frequent categories on top).

.blanks

If TRUE, insert blank spaces instead of repeating consecutive values that are identical.

.recycle

If TRUE, use vector recycling to make all arguments have the same length.

.missing

If TRUE, instead of the usual output, show the frequency and precent (by default) of missing values per column of data. If there are no missing values then no output is produced.

.gen

If TRUE, instead of the usual output, run a "code generation" procedure and print its output, then return NULL invisibly. In this case ... is ignored. See Details and Examples.

.kable

If TRUE, call kable on the final object and return its results. Can also be a character string passed as the format argument of kable. Use NULL or FALSE to just return a data.frame instead.

.graphical

If TRUE, produce graphical output instead of tabular output. Either one or two variables can be plotted.

Details

The main uses cases of this function are to quickly explore data interactively in the console, or create simple tabular summaries in R markdown documents. Similar to summary, but aims to be as convenient as possible and produce nicer looking outputs.

This function does different things depending on its inputs. The first argument data is always a data.frame (or NULL). Next come zero or more vector arguments, typically columns in data (which do not need to be quoted) or functions thereof. Lastly, some optional arguments that begin with . (dot) can be used to control certain aspects of the output.

When called with only a data.frame argument data, outputs a table summarizing the variables in data including the columns: variable (name), label (only present if at least one variable has a label atrribute), class, missing (count) and example (a single value from that variable, typically the first nonmissing value).

When called with data and one other argument, if the argument is categorical outputs a frequency table and if it is continuous outputs a few descriptive statistics (mean, standard deviation, median, min and max). The .max option is used to decide if a numeric argument is continuous or categorical.

When called with more than one argument following data, those arguments should all be categorical (.max is ignored in this case). A frequency table is produced for the combinations of the categories, nested from left to right. Percentages are not shown, just counts, and no sorting is done (the categories appear in the order of factor levels).

By default, the function kable is used to format the output so you get nice looking tables in both the console and in R markdown documents.

If the .gen argument is TRUE, then something different happens. Instead of outputing a table, the function prints code statements into the console: a call to lumos for each variable in data. The code can be copied from the console back into the script and used to explore the data.frame one variable at a time. This is useful because it saves the need to type the code for each variable.

Value

The value returned depends on the parameters. If .kable is TRUE (the default) then an object of class knitr_kable, otherwise a data.frame. See Details and Examples.

Examples

lumos(iris)
lumos(iris, Species)
lumos(iris, .gen=TRUE)  # Generate code statements to call lumos() on each column of iris.

lumos(mtcars)
lumos(mtcars, wt)
lumos(mtcars, cyl)
lumos(mtcars, cyl, .pct=FALSE)
lumos(mtcars, cyl, gear)
lumos(mtcars, cyl, gear, am)
lumos(mtcars, cyl, gear, am, .blanks=FALSE, .kable=FALSE)

benjaminrich/lumos documentation built on Oct. 15, 2024, 3:52 a.m.