fdt: Frequency distribution table for numerical data
In fdth: Frequency Distribution Tables, Histograms and Polygons

View source: R/fdt.R

fdt	R Documentation

Frequency distribution table for numerical data

Description

An S3 set of methods to easily create frequency distribution tables (‘⁠fdt⁠’) from vector, data.frame and matrix objects.

Usage

## S3 generic
fdt(x, ...)

## S3 methods
## Default S3 method:
fdt(x,
    k,
    start,
    end,
    h,
    breaks = c('Sturges', 'Scott', 'FD'),
    right = FALSE,
    na.rm = FALSE, ...)

## S3 method for class 'data.frame'
fdt(x,
    k,
    by,
    breaks = c('Sturges', 'Scott', 'FD'),
    right = FALSE,
    na.rm = FALSE, ...)

## S3 method for class 'matrix'
fdt(x,
    k,
    breaks = c('Sturges', 'Scott', 'FD'),
    right = FALSE,
    na.rm = FALSE, ...)

Arguments

`x`	a `vector`, `data.frame` or `matrix` object. If ‘⁠x⁠’ is `data.frame` or `matrix` it must contain at least one numeric column.
`k`	number of class intervals.
`start`	left endpoint of the first class interval.
`end`	right endpoint of the last class interval.
`h`	class interval width.
`by`	categorical variable used for grouping each numeric variable, useful only on `data.frame`.
`breaks`	method used to determine the number of interval classes, c(“Sturges”, “Scott”, “FD”).
`right`	right endpoints open (default = `FALSE`).
`na.rm`	logical. Should missing values be removed? (default = `FALSE`).
`...`	potential further arguments (required by generic).

Details

The simplest way to run ‘⁠fdt⁠’ is by supplying only the ‘⁠x⁠’ object, for example: nm <- fdt(x). In this case all necessary default values (‘⁠breaks⁠’ and ‘⁠right⁠’) (“Sturges” and FALSE respectively) will be used.

It can also be provided as:

‘⁠x⁠’ and ‘⁠k⁠’ (number of class intervals);
‘⁠x⁠’, ‘⁠start⁠’ (left endpoint of the first class interval) and ‘⁠end⁠’ (right endpoint of the last class interval); or
‘⁠x⁠’, ‘⁠start⁠’, ‘⁠end⁠’ and ‘⁠h⁠’ (class interval width).

These options make ‘⁠fdt⁠’ very easy and flexible.

The ‘⁠fdt⁠’ object stores information used by methods summary, print, plot, mean, median and mfv. The result of plot is a histogram. The methods summary, print and plot provide a reasonable set of parameters to format and plot the ‘⁠fdt⁠’ object in a clear (and publishable) way.

Value

For fdt the method fdt.default returns a list of class fdt.default with the slots:

`\samp{table}`	A `data.frame` storing the ‘⁠fdt⁠’;
`\samp{breaks}`	A `vector` of length 4 storing ‘⁠start⁠’, ‘⁠end⁠’, ‘⁠h⁠’ and ‘⁠right⁠’ of the ‘⁠fdt⁠’ generated by this method.

The methods fdt.data.frame and fdt.matrix return a list of class fdt.multiple. This list has one slot for each numeric (fdt) variable of the ‘⁠x⁠’ provided. Each slot, corresponding to each numeric variable, stores the same slots of the fdt.default described above.

Author(s)

Faria, J. C.
Allaman, I. B
Jelihovschi, E. G.

Examples

library(fdth)

#========
# Vector
#========
x <- rnorm(n = 1e3,
           mean = 5,
           sd = 1)

# x
(ft <- fdt(x))

# x, alternative breaks
(ft <- fdt(x,
           breaks = 'Scott'))

# x, k
(ft <- fdt(x,
           k = 10))

# x, star, end
range(x)

(ft <- fdt(x,
           start = floor(min(x)),
           end = floor(max(x) + 1)))

# x, start, end, h
(ft <- fdt(x,
           start = floor(min(x)),
           end = floor(max(x) + 1),
           h = 1))

# Effect of right
sort(x <- rep(1:3, 3))

(ft <- fdt(x,
           start = 1,
           end = 4,
           h = 1))

(ft <- fdt(x,
           start = 0,
           end = 3,
           h = 1,
           right = TRUE))

#=========================================================
# Data.frame: multivariate with two categorical variables
#=========================================================
mdf <- data.frame(c1 = sample(LETTERS[1:3],
                              1e2,
                              TRUE),
                  c2 = as.factor(sample(1:10,
                                        1e2,
                                        TRUE)),
                  n1 = c(NA,
                         NA,
                         rnorm(96,
                               10,
                               1),
                         NA,
                         NA),
                  n2 = rnorm(100,
                             60,
                             4),
                  n3 = rnorm(100,
                             50,
                             4),
                  stringsAsFactors = TRUE)

str(mdf)

#(ft <- fdt(mdf))  # Error message due to presence of NA values

(ft <- fdt(mdf,
           na.rm = TRUE))

# By factor
(ft <- fdt(mdf,
           k = 5,
           by = 'c1',
           na.rm = TRUE))

# choose FD criteria
(ft <- fdt(mdf,
           breaks = 'FD',
           by = 'c1',
           na.rm = TRUE))

# k
(ft <- fdt(mdf,
           k = 5,
           by = 'c2',
           na.rm = TRUE))

(ft <- fdt(iris[c(1:2, 5)],
           k = 10))

(ft <- fdt(iris[c(1:2, 5)],
           k = 5,
           by = 'Species'))

#========================
# Matrices: multivariate
#========================
(ft <-fdt(state.x77))

summary(ft,
        format = TRUE)

summary(ft,
        format = TRUE,
        pattern = '%.2f')

fdth documentation built on May 26, 2026, 1:06 a.m.