fdt: Frequency distribution table for numerical data

View source: R/fdt.R

fdtR Documentation

Frequency distribution table for numerical data

Description

An S3 set of methods to easily create frequency distribution tables (‘⁠fdt⁠’) from vector, data.frame and matrix objects.

Usage

## S3 generic
fdt(x, ...)

## S3 methods
## Default S3 method:
fdt(x,
    k,
    start,
    end,
    h,
    breaks=c('Sturges', 'Scott', 'FD'),
    right=FALSE,
    na.rm=FALSE, ...)

## S3 method for class 'data.frame'
fdt(x,
    k,
    by,
    breaks=c('Sturges', 'Scott', 'FD'),
    right=FALSE,
    na.rm=FALSE, ...)

## S3 method for class 'matrix'
fdt(x,
    k,
    breaks=c('Sturges', 'Scott', 'FD'),
    right=FALSE,
    na.rm=FALSE, ...)

Arguments

x

a vector, data.frame or matrix object. If ‘⁠x⁠’ is data.frame or matrix it must contain at least one numeric column.

k

number of class intervals.

start

left endpoint of the first class interval.

end

right endpoint of the last class interval.

h

class interval width.

by

categorical variable used for grouping each numeric variable, useful only on data.frame.

breaks

method used to determine the number of interval classes, c(“Sturges”, “Scott”, “FD”).

right

right endpoints open (default = FALSE).

na.rm

logical. Should missing values be removed? (default = FALSE).

...

potential further arguments (required by generic).

Details

The simplest way to run ‘⁠fdt⁠’ is by supplying only the ‘⁠x⁠’ object, for example: nm <- fdt(x). In this case all necessary default values (‘⁠breaks⁠’ and ‘⁠right⁠’) (“Sturges” and FALSE respectively) will be used.

It can also be provided as:

  • ⁠x⁠’ and ‘⁠k⁠’ (number of class intervals);

  • ⁠x⁠’, ‘⁠start⁠’ (left endpoint of the first class interval) and ‘⁠end⁠’ (right endpoint of the last class interval); or

  • ⁠x⁠’, ‘⁠start⁠’, ‘⁠end⁠’ and ‘⁠h⁠’ (class interval width).

These options make ‘⁠fdt⁠’ very easy and flexible.

The ‘⁠fdt⁠’ object stores information used by methods summary, print, plot, mean, median and mfv. The result of plot is a histogram. The methods summary, print and plot provide a reasonable set of parameters to format and plot the ‘⁠fdt⁠’ object in a clear (and publishable) way.

Value

For fdt the method fdt.default returns a list of class fdt.default with the slots:

\samp{table}

A data.frame storing the ‘⁠fdt⁠’;

\samp{breaks}

A vector of length 4 storing ‘⁠start⁠’, ‘⁠end⁠’, ‘⁠h⁠’ and ‘⁠right⁠’ of the ‘⁠fdt⁠’ generated by this method;

\samp{data}

A vector of the data ‘⁠x⁠’ provided.

The methods fdt.data.frame and fdt.matrix return a list of class fdt.multiple. This list has one slot for each numeric (fdt) variable of the ‘⁠x⁠’ provided. Each slot, corresponding to each numeric variable, stores the same slots of the fdt.default described above.

Author(s)

Faria, J. C.
Allaman, I. B
Jelihovschi, E. G.

See Also

hist provided by graphics and table, cut both provided by base.

Examples

library(fdth)

#========
# Vector
#========
x <- rnorm(n=1e3,
           mean=5,
           sd=1)

# x
(ft <- fdt(x))

# x, alternative breaks
(ft <- fdt(x,
           breaks='Scott'))

# x, k
(ft <- fdt(x,
           k=10))

# x, star, end
range(x)

(ft <- fdt(x,
           start=floor(min(x)),
           end=floor(max(x) + 1)))

# x, start, end, h
(ft <- fdt(x,
           start=floor(min(x)),
           end=floor(max(x) + 1),
           h=1))

# Effect of right
sort(x <- rep(1:3, 3))

(ft <- fdt(x,
           start=1,
           end=4,
           h=1))

(ft <- fdt(x,
           start=0,
           end=3,
           h=1,
           right=TRUE))

#=========================================================
# Data.frame: multivariate with two categorical variables
#=========================================================
mdf <- data.frame(c1=sample(LETTERS[1:3], 1e2, TRUE),
                  c2=as.factor(sample(1:10, 1e2, TRUE)),
                  n1=c(NA, NA, rnorm(96, 10, 1), NA, NA),
                  n2=rnorm(100, 60, 4),
                  n3=rnorm(100, 50, 4),
                  stringsAsFactors=TRUE)

str(mdf)

#(ft <- fdt(mdf))  # Error message due to presence of NA values

(ft <- fdt(mdf,
           na.rm=TRUE))

# By factor
(ft <- fdt(mdf,
           k=5,
           by='c1',
           na.rm=TRUE))

# choose FD criteria               
(ft <- fdt(mdf,
           breaks='FD',
           by='c1',
           na.rm=TRUE))

# k
(ft <- fdt(mdf,
           k=5,
           by='c2',
           na.rm=TRUE))

(ft <- fdt(iris[c(1:2, 5)],
           k=10))

(ft <- fdt(iris[c(1:2, 5)],
           k=5,
           by='Species'))

#========================
# Matrices: multivariate
#========================
(ft <-fdt(state.x77))

summary(ft,
        format=TRUE)

summary(ft,
        format=TRUE,
        pattern='%.2f')

fdth documentation built on May 12, 2026, 1:08 a.m.