whatis: Data frame summary

Description Usage Arguments Details Value Author(s) References See Also Examples

Description

Summarize the characteristics of variables (columns) in a data frame.

Usage

1
whatis(x, var.name.truncate = 20, type.truncate = 14)

Arguments

x

a data frame

var.name.truncate

maximum length (in characters) for truncation of variable names. The default is 20; anything less than 12 is less than the column label in the resulting data frame and is a waste of information.

type.truncate

maximum length (in characters) for truncation of variable type; 14 is the full width, but 4 works well if space is at a premium.

Details

The function whatis() provides a basic examination of some characteristics of each variable (column) in a data frame.

Value

A list of characteristics describing the variables in the data frame, x. Each component of the list has length(x) values, one for each variable in the data frame x.

variable.name

from the names(x) attribute, possibly truncated to var.name.truncate characters in length.

type

the possibilities include "pure factor", "mixed factor", "ordered factor", "character", and "numeric"; whatis() considers the possibility that a factor or a vector could contain character and/or numeric values. If both character and numeric values are present, and if the variable is a factor, then it is called a mixed factor. If the levels of a factor are purely character or numeric (but not both), it is a pure factor. Non-factors must then be either character or numeric.

missing

the number of NAs in the variable.

distinct.values

the number of distinct values in the variable, equal to length(table(variable)).

precision

the number of decimal places of precision.

min

the minumum value (if numeric) or first value (alphabetically) as appropriate.

max

the maximum value (if numeric) or the last value (alphabetically) as appropriate.

Author(s)

John W. Emerson, Walton Green

References

Special thanks to John Hartigan and the students of 'Statistical Case Studies' of 2004 for their help troubleshooting and developing the function whatis().

See Also

See also str.

Examples

1
2
3
4
5
6
7
8
9
mydf <- data.frame(a=rnorm(100),
                   b=sample(c("Cat", "Dog"), 100, replace=TRUE),
                   c=sample(c("Apple", "Orange", "8"), 100, replace=TRUE),
                   d=sample(c("Blue", "Red"), 100, replace=TRUE))
mydf$d <- as.character(mydf$d)
whatis(mydf)

data(iris)
whatis(iris)

Example output

Loading required package: grid
  variable.name         type missing distinct.values precision
1             a      numeric       0             100     1e-17
2             b  pure factor       0               2        NA
3             c mixed factor       0               3        NA
4             d    character       0               2        NA
                min              max
1 -2.44764194003561 1.86091987061526
2               Cat              Dog
3                 8           Orange
4              Blue              Red
  variable.name        type missing distinct.values precision    min       max
1  Sepal.Length     numeric       0              35       0.1    4.3       7.9
2   Sepal.Width     numeric       0              23       0.1      2       4.4
3  Petal.Length     numeric       0              43       0.1      1       6.9
4   Petal.Width     numeric       0              22       0.1    0.1       2.5
5       Species pure factor       0               3        NA setosa virginica

YaleToolkit documentation built on May 2, 2019, 11:05 a.m.