fdth-package: Frequency distribution tables, histograms and polygons

fdth-packageR Documentation

Frequency distribution tables, histograms and polygons

Description

The fdth package contains a set of functions that allow users to create frequency distribution tables (‘⁠fdt⁠’) and their associated histograms and frequency polygons (absolute, relative and cumulative). The ‘⁠fdt⁠’ can be formatted in many ways suitable for publication (papers, books, etc). The S3 plot method produces histograms with the convenience and flexibility of a high-level function.

Details

The frequency of a particular observation is the number of times the observation occurs in the data. The distribution of a variable is the pattern of frequencies of the observation.

Frequency distribution tables (‘⁠fdt⁠’) can be used for ordinal, continuous, and categorical variables.

The R environment provides a set of functions (generally low level) enabling the user to perform a ‘⁠fdt⁠’ and the associated graphical representation, the histogram. A ‘⁠fdt⁠’ plays an important role to summarize data information and is the basis for the estimation of probability density function used in parametrical inference.

However, for novice or occasional users of R, it can be laborious to find out all necessary functions and graphical parameters to do a normalized and clear ‘⁠fdt⁠’ tables and associated histograms ready for publication.

That is the aim of this package, i.e., to allow users to create both ‘⁠fdt⁠’ tables and histograms easily and flexibly. The most common input for univariate data is a vector. For multivariate data, both a data.frame in this case also allowing grouping all numerical variables according to one categorical, or matrices.

The simplest way to run ‘⁠fdt⁠’ and ‘⁠fdt_cat⁠’ is by supplying only the ‘⁠x⁠’ object, for example: d <- fdt(x). In this case all necessary default values (‘⁠breaks⁠’ and ‘⁠right⁠’) ("Sturges" and FALSE respectively) will be used, if the ‘⁠x⁠’ object is categorical then just use d <- fdt_cat(x).

If the variable is continuous, you can also supply:

  • ⁠x⁠’ and ‘⁠k⁠’ (number of class intervals);

  • ⁠x⁠’, ‘⁠start⁠’ (left endpoint of the first class interval) and ‘⁠end⁠’ (right endpoint of the last class interval); or

  • ⁠x⁠’, ‘⁠start⁠’, ‘⁠end⁠’ and ‘⁠h⁠’ (class interval width).

These options make the ‘⁠fdt⁠’ very easy and flexible.

The ‘⁠fdt⁠’ and ‘⁠fdt_cat⁠’ object store information to be used by methods summary, print and plot. The result of plot is a histogram or polygon (absolute, relative or cumulative). The methods summary, print and plot provide a reasonable set of parameters to format and plot the ‘⁠fdt⁠’ object in a pretty (and publishable) way.

Author(s)

Faria, J. C.
Allaman, I. B
Jelihovschi, E. G.

See Also

hist provided by graphics and table, cut both provided by base.

Examples

library (fdth)

# Numerical
#=====================
# Vectors: univariate
#=====================
x <- rnorm(n=1e3,
           mean=5,
           sd=1)

(ft <- fdt(x))

# Histograms
plot(ft)  # Absolute frequency histogram

plot(ft,
     main='My title')

plot(ft,
     x.round=3,
     col='darkgreen')

plot(ft,
     xlas=2)

plot(ft,
     x.round=3,
     xlas=2,
     xlab=NULL)

plot(ft,
     v=TRUE,
     cex=.8,
     x.round=3,
     xlas=2,
     xlab=NULL,
     col=rainbow(11))

plot(ft,
     type='fh')    # Absolute frequency histogram

plot(ft,
     type='rfh')   # Relative frequency histogram

plot(ft,
     type='rfph')  # Relative frequency (%) histogram

plot(ft,
     type='cdh')   # Cumulative density histogram

plot(ft,
     type='cfh')   # Cumulative frequency histogram

plot(ft,
     type='cfph')  # Cumulative frequency (%) histogram

# Polygons
plot(ft,
     type='fp')    # Absolute frequency polygon

plot(ft,
     type='rfp')   # Relative frequency polygon

plot(ft,
     type='rfpp')  # Relative frequency (%) polygon

plot(ft,
     type='cdp')   # Cumulative density polygon

plot(ft,
     type='cfp')   # Cumulative frequency polygon

plot(ft,
     type='cfpp')  # Cumulative frequency (%) polygon

# Density
plot(ft,
     type='d')     # Density

# Summary
ft

summary(ft)  # same result

print(ft)    # same result

show(ft)     # same result

summary(ft,
        format=TRUE)      # This may not be what you want for publication.

summary(ft,
        format=TRUE,
        pattern='%.2f')   # Better, but can it be improved?

summary(ft,
        col=c(1:2, 4, 6),
        format=TRUE,
        pattern='%.2f')   # Yes, it can!

range(x)                  # To inspect the range of x

summary(fdt(x,
            start=1, 
            end=9,
            h=1),
        col=c(1:2, 4, 6),
        format=TRUE,
        pattern='%d')     # Is it nice now?

# The fdt.object
ft[['table']]                        # Stores the frequency distribution table (fdt)
ft[['breaks']]                       # Stores the breaks of fdt
ft[['breaks']]['start']              # Stores the left value of the first class
ft[['breaks']]['end']                # Stores the right value of the last class
ft[['breaks']]['h']                  # Stores the class interval
as.logical(ft[['breaks']]['right'])  # Stores the right option

# Theoretical curve and fdt
y <- rnorm(1e5,
           mean=5, 
           sd=1)

ft <- fdt(y,
          k=100)

plot(ft,
     type='d',                       # density
     col=heat.colors(100))

curve(dnorm(x,
            mean=5, 
            sd=1),
      n=1e3,      
      add=TRUE, 
      lwd=3,
      col='dark blue')

#======================================================
# Data.frames: multivariate with categorical variables
#======================================================
mdf <- data.frame(X1=rep(LETTERS[1:4], 25),
                  X2=as.factor(rep(1:10, 10)),
                  Y1=c(NA, NA, rnorm(96, 10, 1), NA, NA),
                  Y2=rnorm(100, 60, 4),
                  Y3=rnorm(100, 50, 4),
                  Y4=rnorm(100, 40, 4),
                  stringsAsFactors=TRUE)

#(ft <- fdt(mdf)) # Error message due to presence of NA values

(ft <- fdt(mdf,
           na.rm=TRUE))

# Histograms
plot(ft,
     v=TRUE)

plot(ft,
     col=rainbow(8))

plot(ft,
     type='fh')

plot(ft,
     type='rfh')

plot(ft,
     type='rfph')

plot(ft,
     type='cdh')

plot(ft,
     type='cfh')

plot(ft,
     type='cfph')

# Polygons
plot(ft,
     v=TRUE,
     type='fp')

plot(ft,
     type='rfp')

plot(ft,
     type='rfpp')

plot(ft,
     type='cdp')

plot(ft,
     type='cfp')

plot(ft,
     type='cfpp') 

# Density
plot(ft,
     type='d') 

# Summary
ft

summary(ft)  # same result

print(ft)    # same result

show(ft)     # same result

summary(ft,
        format=TRUE)

summary(ft,
        format=TRUE, 
        pattern='%05.2f')  # regular expression

summary(ft,
        col=c(1:2, 4, 6), 
        format=TRUE,
        pattern='%05.2f')

print(ft,
      col=c(1:2, 4, 6))

print(ft,
      col=c(1:2, 4, 6), 
      format=TRUE,
      pattern='%05.2f')

# Using by
levels(mdf$X1)

plot(fdt(mdf,
         k=5,
         by='X1',
         na.rm=TRUE),
     col=rainbow(5))

levels(mdf$X2)

summary(fdt(iris,
            k=5),
        format=TRUE,
        patter='%04.2f')

plot(fdt(iris,
         k=5),
     col=rainbow(5))

levels(iris$Species)

summary(fdt(iris,
            k=5,
            by='Species'),
        format=TRUE,
        patter='%04.2f')

plot(fdt(iris,
         k=5,
         by='Species'),
     v=TRUE)

#========================
# Matrices: multivariate
#========================
summary(fdt(state.x77),
        col=c(1:2, 4, 6),
        format=TRUE)

plot(fdt(state.x77))

# Very big
summary(fdt(volcano,
            right=TRUE),
        col=c(1:2, 4, 6),
        round=3,
        format=TRUE,
        pattern='%05.1f')

plot(fdt(volcano,
         right=TRUE))

## Categorical
x <- sample(x=letters[1:5],
            size=5e2,
            rep=TRUE)

(fdt.c <- fdt_cat(x))

(fdt.c <- fdt_cat(x,
                  sort=FALSE))

#=========================================================
# Data.frame: multivariate with two categorical variables
#=========================================================
mdf <- data.frame(c1=sample(LETTERS[1:3], 1e2, rep=TRUE),
                  c2=as.factor(sample(1:10, 1e2, rep=TRUE)),
                  n1=c(NA, NA, rnorm(96, 10, 1), NA, NA),
                  n2=rnorm(100, 60, 4),
                  n3=rnorm(100, 50, 4),
                  stringsAsFactors=TRUE)

str(mdf)

(fdt.c <- fdt_cat(mdf))

(fdt.c <- fdt_cat(mdf,
                  dec=FALSE))

(fdt.c <- fdt_cat(mdf,
                  sort=FALSE))

(fdt.c <- fdt_cat(mdf,
                  by='c1'))

#=========================
# Matrix: two categorical
#=========================
x <- matrix(sample(x=letters[1:10],
                   size=100,
                   rep=TRUE),
            nc=2,
            dimnames=list(NULL,
                          c('c1', 'c2')))

head(x)

(fdt.c <- fdt_cat(x))


fdth documentation built on May 12, 2026, 1:08 a.m.