lumos: Shed Light on Your Data
In benjaminrich/lumos: Shed Light on Your Data

View source: R/lumos.R

lumos

R Documentation

Shed Light on Your Data

Description

Like a magic wand for exploring a data.frame. It's so powerful that it can be abbreviated to the single letter l.

Usage

lumos(
  data = NULL,
  ...,
  .drop = TRUE,
  .max = 20,
  .pct = TRUE,
  .order.by.freq = .pct,
  .blanks = TRUE,
  .recycle = TRUE,
  .missing = FALSE,
  .gen = FALSE,
  .kable = TRUE,
  .graphical = FALSE
)

l(
  data = NULL,
  ...,
  .drop = TRUE,
  .max = 20,
  .pct = TRUE,
  .order.by.freq = .pct,
  .blanks = TRUE,
  .recycle = TRUE,
  .missing = FALSE,
  .gen = FALSE,
  .kable = TRUE,
  .graphical = FALSE
)

lumos_plot(...)

ll(...)

Arguments

`data`	A `data.frame` (or, and object that can be converted to one by call `link{as.data.frame}`).
`...`	Further argument, evaluated within `data` (so they can directly refer to columns in data without quotes).
`.drop`	If `TRUE`, unused factor levels are dropped.
`.max`	For a single categorical variable, the maximum number of unique categories to show (can be `Inf`). For a single numeric variable, if there are no more than this many unique values, the variable will be treated as categorical.
`.pct`	If `TRUE`, show percents along with counts for single categorical variables.
`.order.by.freq`	If `TRUE`, for a single categorical variable, show the categories in decreasing order of frequency, from top to bottom (i.e. show the most frequent categories on top).
`.blanks`	If `TRUE`, insert blank spaces instead of repeating consecutive values that are identical.
`.recycle`	If `TRUE`, use vector recycling to make all arguments have the same length.
`.missing`	If `TRUE`, instead of the usual output, show the frequency and precent (by default) of missing values per column of `data`. If there are no missing values then no output is produced.
`.gen`	If `TRUE`, instead of the usual output, run a "code generation" procedure and print its output, then return `NULL` invisibly. In this case `...` is ignored. See Details and Examples.
`.kable`	If `TRUE`, call `kable` on the final object and return its results. Can also be a character string passed as the `format` argument of `kable`. Use `NULL` or `FALSE` to just return a `data.frame` instead.
`.graphical`	If `TRUE`, produce graphical output instead of tabular output. Either one or two variables can be plotted.

Details

The main uses cases of this function are to quickly explore data interactively in the console, or create simple tabular summaries in R markdown documents. Similar to summary, but aims to be as convenient as possible and produce nicer looking outputs.

This function does different things depending on its inputs. The first argument data is always a data.frame (or NULL). Next come zero or more vector arguments, typically columns in data (which do not need to be quoted) or functions thereof. Lastly, some optional arguments that begin with . (dot) can be used to control certain aspects of the output.

When called with only a data.frame argument data, outputs a table summarizing the variables in data including the columns: variable (name), label (only present if at least one variable has a label atrribute), class, missing (count) and example (a single value from that variable, typically the first nonmissing value).

When called with data and one other argument, if the argument is categorical outputs a frequency table and if it is continuous outputs a few descriptive statistics (mean, standard deviation, median, min and max). The .max option is used to decide if a numeric argument is continuous or categorical.

When called with more than one argument following data, those arguments should all be categorical (.max is ignored in this case). A frequency table is produced for the combinations of the categories, nested from left to right. Percentages are not shown, just counts, and no sorting is done (the categories appear in the order of factor levels).

By default, the function kable is used to format the output so you get nice looking tables in both the console and in R markdown documents.

If the .gen argument is TRUE, then something different happens. Instead of outputing a table, the function prints code statements into the console: a call to lumos for each variable in data. The code can be copied from the console back into the script and used to explore the data.frame one variable at a time. This is useful because it saves the need to type the code for each variable.

Value

The value returned depends on the parameters. If .kable is TRUE (the default) then an object of class knitr_kable, otherwise a data.frame. See Details and Examples.

Examples

lumos(iris)
lumos(iris, Species)
lumos(iris, .gen=TRUE)  # Generate code statements to call lumos() on each column of iris.

lumos(mtcars)
lumos(mtcars, wt)
lumos(mtcars, cyl)
lumos(mtcars, cyl, .pct=FALSE)
lumos(mtcars, cyl, gear)
lumos(mtcars, cyl, gear, am)
lumos(mtcars, cyl, gear, am, .blanks=FALSE, .kable=FALSE)

benjaminrich/lumos documentation built on Oct. 15, 2024, 3:52 a.m.