This makes it easy to quickly scan through all of the columns in a
data frame to spot unexpected patterns or data entry errors. Numeric variables are depicted as
histograms, while factor and character variables are summarized by
the R table function and then presented as barplots. This is most
useful with a large screen graphic device (try running the function
provided with this package,
or any other method you prefer to create a large device.
1 2 3 4
An R data frame or something that can be coerced to a
data frame by
Default TRUE. Do you want display of the columns in alphabetical order?
Should output go in file rather than to the screen. Default is NULL, meaning show on screen. If you supply a file name, we will write PDF output into it.
If TRUE, counts from histogram bins and tables will appear in the console.
As in the old style R
Additional arguments for the pdf, histogram, table, or barplot functions. Please see Details below.
A text stub that will appear in the x axis label. Currently it includes advertising for this package.
As in the histogram frequency argument. Should graphs show counts (freq = TRUE) or proportions (AKA densities) (freq = FALSE)
A list of arguments to be passed to the
A list of arguments to be passed to the
A vector of column names that were plotted
Every effort has been made to make this
simple and easy to use. Please run the examples as they are
before becoming too concerned about customization. This
function is intended for getting a quick look at each
variable, one-by-one, it is not intended to create publication
quality histograms. For sake of the fastidious users, a lot
of settings can be adjusted. Users can control the parameters
for presentation of histograms (parameters for
and barplots (parameters for
barplot). The function also
can create frequency tables (which users can control by providing
additional named arguments).
The histograms are standard, upright histograms. The barplots are horizontal. I chose to make the bars horizontal because long value labels are more easily accomodated on the left axis. The code measures the length (in inches) for strings and the margin is increased accordingly. The examples have a demonstration of that effect.
additional named arguments,
..., are inspected and sorted into groups intended to
control use of R functions
The parameters c("exclude", "dnn", "useNA", "deparse.level") and will go to the
table function, which is used to make barplots for
factor and character variables. These named arguments are
extracted and sent to the pdf function: c("width", "height",
"onefile", "family", "title", "fonts", "version", "paper",
"encoding", "bg", "fg", "pointsize", "pagecentre",
"colormodel", "useDingbats", "useKerning", "fillOddEven",
"compress"). Any other arguments that are unique to
barplot are sorted out and sent only to
Any other arguments, including graphical parameters will be sent to both the histogram and barplot functions, so it is a convenient way to obtain uniform appearance. Additional arguments that are common to
hist will work, and so will any
graphics parameters (named arguments of
example). However, if one wants to target some arguments to
hist, but not
barplot, then the
list argument should be used. Similarly,
be used to send argument to the
function. Warning: the defaults for
barargs include some settings that are needed for the
existing design. If new lists for
barargs are supplied, the previously specified defaults
are lost. Hence, users should include the existing members of
those lists, possibly with revised values.
All of this argument sorting effort is done in order to reduce a prolific number of warnings that were observed in previous editions of this function.
Paul Johnson <email@example.com>
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41
set.seed(234234) N <- 200 mydf <- data.frame(x5 = rnorm(N), x4 = rnorm(N), x3 = rnorm(N), x2 = letters[sample(1:24, 200, replace = TRUE)], x1 = factor(sample(c("cindy", "bobby", "marsha", "greg", "chris"), 200, replace = TRUE)), stringsAsFactors = FALSE) ## Insert 16 missings mydf$x1[sample(1:150, 16,)] <- NA mydf$adate <- as.Date(c("1jan1960", "2jan1960", "31mar1960", "30jul1960"), format = "%d%b%y") peek(mydf) peek(mydf, sort = FALSE) ## Demonstrate the dot-dot-dot usage to pass in hist params peek(mydf, breaks = 30, ylab = "These are Counts, not Densities", freq = TRUE) ## Not Run: file output ## peek(mydf, sort = FALSE, file = "three_histograms.pdf") ## Use some objects from the datasets package library(datasets) peek(cars, xlabstub = "R cars data: ") peek(EuStockMarkets, xlabstub = "Euro Market Data: ") peek(EuStockMarkets, xlabstub = "Euro Market Data: ", breaks = 50, freq = TRUE) ## Not run: file output ## peek(EuStockMarkets, breaks = 50, file = "myeuro.pdf", ## height = 4, width=3, family = "Times") ## peek(EuStockMarkets, breaks = 50, file = "myeuro-%d3.pdf", ## onefile = FALSE, family = "Times", textout = TRUE) ## xlab goes into "..." and affects both histograms and barplots peek(mydf, breaks = 30, ylab = "These are Counts, not Densities", freq = TRUE) ## xlab is added in the barargs list. peek(mydf, breaks = 30, ylab = "These are Counts, not Densities", freq = TRUE, barargs = list(horiz = TRUE, las = 1, xlab = "I'm in barargs")) peek(mydf, breaks = 30, ylab = "These are Counts, not Densities", freq = TRUE, barargs = list(horiz = TRUE, las = 1, xlim = c(0, 100), xlab = "I'm in barargs, not in histargs")) levels(mydf$x1) <- c(levels(mydf$x1), "arthur philpot smythe") mydf$x1 <- "arthur philpot smythe" mydf$x2 <- "I forgot what letter" peek(mydf, breaks = 30, barargs = list(horiz = TRUE, las = 1))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.