grouped.data: Grouped data

View source: R/grouped.data.R

grouped.dataR Documentation

Grouped data

Description

Creation of grouped data objects, from either a provided set of group boundaries and group frequencies, or from individual data using automatic or specified breakpoints.

Usage

grouped.data(..., breaks = "Sturges", include.lowest = TRUE,
             right = TRUE, nclass = NULL, group = FALSE,
             row.names = NULL, check.rows = FALSE,
             check.names = TRUE)

Arguments

...

arguments of the form value or tag = value; see Details.

breaks

same as for hist, namely one of:

  • a vector giving the breakpoints between groups;

  • a function to compute the vector of breakpoints;

  • a single number giving the number of groups;

  • a character string naming an algorithm to compute the number of groups (see hist);

  • a function to compute the number of groups.

In the last three cases the number is a suggestion only; the breakpoints will be set to pretty values. If breaks is a function, the first element in ... is supplied to it as the only argument.

include.lowest

logical; if TRUE, a data point equal to the breaks value will be included in the first (or last, for right = FALSE) group. Used only for individual data; see Details.

right

logical; indicating if the intervals should be closed on the right (and open on the left) or vice versa.

nclass

numeric (integer); equivalent to breaks for a scalar or character argument.

group

logical; an alternative way to force grouping of individual data.

row.names, check.rows, check.names

arguments identical to those of data.frame.

Details

A grouped data object is a special form of data frame consisting of one column of contiguous group boundaries and one or more columns of frequencies within each group.

The function can create a grouped data object from two types of arguments.

  1. Group boundaries and frequencies. This is the default mode of operation if the call has at least two elements in ....

    The first argument will then be taken as the vector of group boundaries. This vector must be exactly one element longer than the other arguments, which will be taken as vectors of group frequencies. All arguments are coerced to data frames.

  2. Individual data. This mode of operation is active if there is a single argument in ..., or if either breaks or nclass is specified or group is TRUE.

    Arguments of ... are first grouped using hist. If needed, breakpoints are set using the first argument.

Missing (NA) frequencies are replaced by zeros, with a warning.

Extraction and replacement methods exist for grouped.data objects, but working on non adjacent groups will most likely yield useless results.

Value

An object of class c("grouped.data", "data.frame") with an environment containing the vector cj of group boundaries.

Author(s)

Vincent Goulet vincent.goulet@act.ulaval.ca, Mathieu Pigeon and Louis-Philippe Pouliot

References

Klugman, S. A., Panjer, H. H. and Willmot, G. E. (1998), Loss Models, From Data to Decisions, Wiley.

See Also

[.grouped.data for extraction and replacement methods.

data.frame for usual data frame creation and manipulation.

hist for details on the calculation of breakpoints.

Examples

## Most common usage using a predetermined set of group
## boundaries and group frequencies.
cj <- c(0, 25, 50, 100, 250, 500, 1000)
nj <- c(30, 31, 57, 42, 45, 10)
(x <- grouped.data(Group = cj, Frequency = nj))
class(x)

x[, 1] # group boundaries
x[, 2] # group frequencies

## Multiple frequency columns are supported
x <- sample(1:100, 9)
y <- sample(1:100, 9)
grouped.data(cj = 1:10, nj.1 = x, nj.2 = y)

## Alternative usage with grouping of individual data.
grouped.data(x)                         # automatic breakpoints
grouped.data(x, breaks = 7)             # forced number of groups
grouped.data(x, breaks = c(0,25,75,100))    # specified groups
grouped.data(x, y, breaks = c(0,25,75,100)) # multiple data sets

## Not run: ## Providing two or more data sets and automatic breakpoints is
## very error-prone since the range of the first data set has to
## include the ranges of all the other data sets.
range(x)
range(y)
grouped.data(x, y, group = TRUE)
## End(Not run)

actuar documentation built on Nov. 8, 2023, 9:06 a.m.