| fdth-package | R Documentation |
The fdth package contains a set of functions that allow
users to create frequency distribution tables (‘fdt’) and their associated
histograms and frequency polygons (absolute, relative and cumulative).
The ‘fdt’ can be formatted in many ways suitable for
publication (papers, books, etc).
The S3 plot method produces histograms with the
convenience and flexibility of a high-level function.
The frequency of a particular observation is the number of times the observation occurs in the data. The distribution of a variable is the pattern of frequencies of the observation.
Frequency distribution tables (‘fdt’) can be used for ordinal, continuous, and categorical variables.
The R environment provides a set of functions (generally low level)
enabling the user to perform a ‘fdt’ and the associated graphical representation,
the histogram. A ‘fdt’ plays an important role to summarize data information and
is the basis for the estimation of probability density function used in
parametrical inference.
However, for novice or occasional users of R, it can be laborious to
find out all necessary functions and graphical parameters to do a normalized
and clear ‘fdt’ tables and associated histograms ready for publication.
That is the aim of this package, i.e., to allow users to create
both ‘fdt’ tables and histograms easily and flexibly. The most common input for univariate data is
a vector. For multivariate data, both a data.frame
in this case also allowing grouping all numerical variables according to one
categorical, or matrices.
The simplest way to run ‘fdt’ and ‘fdt_cat’ is by supplying only the ‘x’
object, for example: d <- fdt(x). In this case all necessary
default values (‘breaks’ and ‘right’) ("Sturges" and FALSE
respectively) will be used, if the ‘x’ object is categorical then just use
d <- fdt_cat(x).
If the variable is continuous, you can also supply:
‘x’ and ‘k’ (number of class intervals);
‘x’, ‘start’ (left endpoint of the first class interval) and ‘end’ (right endpoint of the last class interval); or
‘x’, ‘start’, ‘end’ and ‘h’ (class interval width).
These options make the ‘fdt’ very easy and flexible.
The ‘fdt’ and ‘fdt_cat’ object store information to be used by methods summary,
print and plot. The result of plot is a histogram or
polygon (absolute, relative or cumulative).
The methods summary, print and plot provide a reasonable
set of parameters to format and plot the ‘fdt’ object in a pretty
(and publishable) way.
Faria, J. C.
Allaman, I. B
Jelihovschi, E. G.
hist provided by graphics and
table, cut both provided by base.
library (fdth)
# Numerical
#=====================
# Vectors: univariate
#=====================
x <- rnorm(n=1e3,
mean=5,
sd=1)
(ft <- fdt(x))
# Histograms
plot(ft) # Absolute frequency histogram
plot(ft,
main='My title')
plot(ft,
x.round=3,
col='darkgreen')
plot(ft,
xlas=2)
plot(ft,
x.round=3,
xlas=2,
xlab=NULL)
plot(ft,
v=TRUE,
cex=.8,
x.round=3,
xlas=2,
xlab=NULL,
col=rainbow(11))
plot(ft,
type='fh') # Absolute frequency histogram
plot(ft,
type='rfh') # Relative frequency histogram
plot(ft,
type='rfph') # Relative frequency (%) histogram
plot(ft,
type='cdh') # Cumulative density histogram
plot(ft,
type='cfh') # Cumulative frequency histogram
plot(ft,
type='cfph') # Cumulative frequency (%) histogram
# Polygons
plot(ft,
type='fp') # Absolute frequency polygon
plot(ft,
type='rfp') # Relative frequency polygon
plot(ft,
type='rfpp') # Relative frequency (%) polygon
plot(ft,
type='cdp') # Cumulative density polygon
plot(ft,
type='cfp') # Cumulative frequency polygon
plot(ft,
type='cfpp') # Cumulative frequency (%) polygon
# Density
plot(ft,
type='d') # Density
# Summary
ft
summary(ft) # same result
print(ft) # same result
show(ft) # same result
summary(ft,
format=TRUE) # This may not be what you want for publication.
summary(ft,
format=TRUE,
pattern='%.2f') # Better, but can it be improved?
summary(ft,
col=c(1:2, 4, 6),
format=TRUE,
pattern='%.2f') # Yes, it can!
range(x) # To inspect the range of x
summary(fdt(x,
start=1,
end=9,
h=1),
col=c(1:2, 4, 6),
format=TRUE,
pattern='%d') # Is it nice now?
# The fdt.object
ft[['table']] # Stores the frequency distribution table (fdt)
ft[['breaks']] # Stores the breaks of fdt
ft[['breaks']]['start'] # Stores the left value of the first class
ft[['breaks']]['end'] # Stores the right value of the last class
ft[['breaks']]['h'] # Stores the class interval
as.logical(ft[['breaks']]['right']) # Stores the right option
# Theoretical curve and fdt
y <- rnorm(1e5,
mean=5,
sd=1)
ft <- fdt(y,
k=100)
plot(ft,
type='d', # density
col=heat.colors(100))
curve(dnorm(x,
mean=5,
sd=1),
n=1e3,
add=TRUE,
lwd=3,
col='dark blue')
#======================================================
# Data.frames: multivariate with categorical variables
#======================================================
mdf <- data.frame(X1=rep(LETTERS[1:4], 25),
X2=as.factor(rep(1:10, 10)),
Y1=c(NA, NA, rnorm(96, 10, 1), NA, NA),
Y2=rnorm(100, 60, 4),
Y3=rnorm(100, 50, 4),
Y4=rnorm(100, 40, 4),
stringsAsFactors=TRUE)
#(ft <- fdt(mdf)) # Error message due to presence of NA values
(ft <- fdt(mdf,
na.rm=TRUE))
# Histograms
plot(ft,
v=TRUE)
plot(ft,
col=rainbow(8))
plot(ft,
type='fh')
plot(ft,
type='rfh')
plot(ft,
type='rfph')
plot(ft,
type='cdh')
plot(ft,
type='cfh')
plot(ft,
type='cfph')
# Polygons
plot(ft,
v=TRUE,
type='fp')
plot(ft,
type='rfp')
plot(ft,
type='rfpp')
plot(ft,
type='cdp')
plot(ft,
type='cfp')
plot(ft,
type='cfpp')
# Density
plot(ft,
type='d')
# Summary
ft
summary(ft) # same result
print(ft) # same result
show(ft) # same result
summary(ft,
format=TRUE)
summary(ft,
format=TRUE,
pattern='%05.2f') # regular expression
summary(ft,
col=c(1:2, 4, 6),
format=TRUE,
pattern='%05.2f')
print(ft,
col=c(1:2, 4, 6))
print(ft,
col=c(1:2, 4, 6),
format=TRUE,
pattern='%05.2f')
# Using by
levels(mdf$X1)
plot(fdt(mdf,
k=5,
by='X1',
na.rm=TRUE),
col=rainbow(5))
levels(mdf$X2)
summary(fdt(iris,
k=5),
format=TRUE,
patter='%04.2f')
plot(fdt(iris,
k=5),
col=rainbow(5))
levels(iris$Species)
summary(fdt(iris,
k=5,
by='Species'),
format=TRUE,
patter='%04.2f')
plot(fdt(iris,
k=5,
by='Species'),
v=TRUE)
#========================
# Matrices: multivariate
#========================
summary(fdt(state.x77),
col=c(1:2, 4, 6),
format=TRUE)
plot(fdt(state.x77))
# Very big
summary(fdt(volcano,
right=TRUE),
col=c(1:2, 4, 6),
round=3,
format=TRUE,
pattern='%05.1f')
plot(fdt(volcano,
right=TRUE))
## Categorical
x <- sample(x=letters[1:5],
size=5e2,
rep=TRUE)
(fdt.c <- fdt_cat(x))
(fdt.c <- fdt_cat(x,
sort=FALSE))
#=========================================================
# Data.frame: multivariate with two categorical variables
#=========================================================
mdf <- data.frame(c1=sample(LETTERS[1:3], 1e2, rep=TRUE),
c2=as.factor(sample(1:10, 1e2, rep=TRUE)),
n1=c(NA, NA, rnorm(96, 10, 1), NA, NA),
n2=rnorm(100, 60, 4),
n3=rnorm(100, 50, 4),
stringsAsFactors=TRUE)
str(mdf)
(fdt.c <- fdt_cat(mdf))
(fdt.c <- fdt_cat(mdf,
dec=FALSE))
(fdt.c <- fdt_cat(mdf,
sort=FALSE))
(fdt.c <- fdt_cat(mdf,
by='c1'))
#=========================
# Matrix: two categorical
#=========================
x <- matrix(sample(x=letters[1:10],
size=100,
rep=TRUE),
nc=2,
dimnames=list(NULL,
c('c1', 'c2')))
head(x)
(fdt.c <- fdt_cat(x))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.